[jira] [Commented] (BEAM-9198) BeamSQL aggregation analytics functions

Rui Wang (Jira) Tue, 18 Feb 2020 10:08:20 -0800


    [ 
https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039300#comment-17039300
 ]


Rui Wang commented on BEAM-9198:
--------------------------------

Hello John!

>I noticed that the SQL extensions of Beam are only implemented for the Java 
>SDK, therefore this project only involves working in that SDK, right?.

Yes.  You will only need to work on Java SDK.

>According to the documentation there are two SQL dialects (Calcite and Zeta) 
>that are supported by Beam, will these new aggregation functions be 
>implemented in both dialects?.

Two SQL dialects in BeamSQL share the same physical operator implementation: 
they are just different frontends. You could only support the functionality for 
one dialect, and later the other can enable such support easily (e.g. you don't 
need to reimplement everything for the second dialect).

>Finally, are there some other implementations of aggregation functions (or 
>similar) that I could check out in other SDKs?. I would really appreciated if 
>you could give some resources / examples that I could analyze.

To learn some concepts about it, this doc gives some great information: 
https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
 

If you want to know some reference implementations, there are two things that 
might be helpful:
1. Check about Beam programming model: 
https://beam.apache.org/documentation/programming-guide/#overview
2. some existing some BeamSQL aggregation function implementations: 
https://github.com/apache/beam/tree/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg



Lastly. In case you have concern with: you don't need some distributed system 
backend (e.g. spark) to develop the functionality. Beam has a local runner 
which can run your code/pipeline locally. The design of it is if you have some 
running code on local runner, that should be sufficient to run on 
Spark/Flink/Dataflow etc. So if you have a working computer that can run Java 
and Gradle, you should be good to start.
 


> BeamSQL aggregation analytics functions 
> ----------------------------------------
>
>                 Key: BEAM-9198
>                 URL: https://issues.apache.org/jira/browse/BEAM-9198
>             Project: Beam
>          Issue Type: Task
>          Components: dsl-sql
>            Reporter: Rui Wang
>            Priority: Major
>              Labels: gsoc, gsoc2020, mentor
>
> BeamSQL has a long list of of aggregation/aggregation analytics 
> functionalities to support. 
> To begin with, you will need to support this syntax:
> {code:sql}
> analytic_function_name ( [ argument_list ] )
>   OVER (
>     [ PARTITION BY partition_expression_list ]
>     [ ORDER BY expression [{ ASC | DESC }] [, ...] ]
>     [ window_frame_clause ]
>   )
> {code}
> This will requires touch core components of BeamSQL:
> 1. SQL parser to support the syntax above.
> 2. SQL core to implement physical relational operator.
> 3. Distributed algorithms to implement a list of functions in a distributed 
> manner. 
> 4. Build benchmarks to measure performance of your implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9198) BeamSQL aggregation analytics functions

Reply via email to