[
https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039300#comment-17039300
]
Rui Wang commented on BEAM-9198:
--------------------------------
Hello John!
>I noticed that the SQL extensions of Beam are only implemented for the Java
>SDK, therefore this project only involves working in that SDK, right?.
Yes. You will only need to work on Java SDK.
>According to the documentation there are two SQL dialects (Calcite and Zeta)
>that are supported by Beam, will these new aggregation functions be
>implemented in both dialects?.
Two SQL dialects in BeamSQL share the same physical operator implementation:
they are just different frontends. You could only support the functionality for
one dialect, and later the other can enable such support easily (e.g. you don't
need to reimplement everything for the second dialect).
>Finally, are there some other implementations of aggregation functions (or
>similar) that I could check out in other SDKs?. I would really appreciated if
>you could give some resources / examples that I could analyze.
To learn some concepts about it, this doc gives some great information:
https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
If you want to know some reference implementations, there are two things that
might be helpful:
1. Check about Beam programming model:
https://beam.apache.org/documentation/programming-guide/#overview
2. some existing some BeamSQL aggregation function implementations:
https://github.com/apache/beam/tree/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg
Lastly. In case you have concern with: you don't need some distributed system
backend (e.g. spark) to develop the functionality. Beam has a local runner
which can run your code/pipeline locally. The design of it is if you have some
running code on local runner, that should be sufficient to run on
Spark/Flink/Dataflow etc. So if you have a working computer that can run Java
and Gradle, you should be good to start.
> BeamSQL aggregation analytics functions
> ----------------------------------------
>
> Key: BEAM-9198
> URL: https://issues.apache.org/jira/browse/BEAM-9198
> Project: Beam
> Issue Type: Task
> Components: dsl-sql
> Reporter: Rui Wang
> Priority: Major
> Labels: gsoc, gsoc2020, mentor
>
> BeamSQL has a long list of of aggregation/aggregation analytics
> functionalities to support.
> To begin with, you will need to support this syntax:
> {code:sql}
> analytic_function_name ( [ argument_list ] )
> OVER (
> [ PARTITION BY partition_expression_list ]
> [ ORDER BY expression [{ ASC | DESC }] [, ...] ]
> [ window_frame_clause ]
> )
> {code}
> This will requires touch core components of BeamSQL:
> 1. SQL parser to support the syntax above.
> 2. SQL core to implement physical relational operator.
> 3. Distributed algorithms to implement a list of functions in a distributed
> manner.
> 4. Build benchmarks to measure performance of your implementation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)