[jira] [Commented] (SAMZA-483) A common representation of relational algebra for streaming SQL

Chris Riccomini (JIRA) Wed, 14 Jan 2015 14:43:28 -0800

    [ 
https://issues.apache.org/jira/browse/SAMZA-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277822#comment-14277822
 ]


Chris Riccomini commented on SAMZA-483:
---------------------------------------

bq. 2. Semantic analysis (We implement semantic analysis phase on top of AST or 
some other model generated from AST)

What is the output of this phase? Is this where either a relational algebra 
model vs. sql object model comes into play?

bq. Optimization can be optional for now.

+1

bq. I am proposing that we just focus on a common representation layer that can 
represent the execution plan in 4

As I understand it, the current patch has two main patterns: operators and 
specs. Would the operators be step (4), and the specs would be the output of 
step (2)? Something needs to translate between the results of the semantic 
analaysis and the specs. If that thing does the translation, can we just forgo 
the specs all together, and have something that takes the results of the 
semantic analysis and directly instantiates the operators?

A general comment: perhaps it would be constructive to show a full example flow 
from REPL to operators? I think that there might be two phases to the 
execution. The first phase is done from the driver machine, when the user 
executes a command from a REPL.

# User enters SELECT * FROM MyStream WHERE foo = bar;
# ANTLR parses grammar and generates an AST.
# We write code that translates the AST into either a SQL OM or Relational 
Algebra OM.
# We write code that translates from OM into a set of Samza job configs, all of 
which have task.class=org.apache.samza.sql.SqlTask.
# We write code that starts the set of jobs.

The second phase occurs on in each container of each job that was started for 
the query.

# SqlTask receives configs, which amount to a machine-compiled query. This 
could be the OM or a SQL query fragment string (e.g. SELECT * FROM MyStream).
# If the machine-compiled query is just a SQL query fragment, the SQL task 
again uses ANTLR to parse the grammar and generate an AST, and then converts 
the AST to an OM.
# SqlTask converts OM to a set of operators that it can wire together and 
execute.

Is my mental model correct? If so, it sounds like [~milinda] is discussing what 
kind of object model in step (3) of the first phase. [~nickpan47], where does 
the "spec" pattern fit into this flow? I think it would be in step (3) of phase 
2, when the SqlTask has to actually convert the object model to the operators. 
When this happens, you need a way to programmatically instantiate the 
operators, and I think this is where the specs/factories come in, right?

> A common representation of relational algebra for streaming SQL 
> ----------------------------------------------------------------
>
>                 Key: SAMZA-483
>                 URL: https://issues.apache.org/jira/browse/SAMZA-483
>             Project: Samza
>          Issue Type: Sub-task
>          Components: sql
>            Reporter: Yi Pan (Data Infrastructure)
>            Priority: Minor
>              Labels: project
>
> Per discussion with [~criccomini] and [~milinda], we agreed that it seems to 
> be a good idea to define a common representation of relational algebra on top 
> of the operators defined in the operator layer (see SAMZA-482), which can be 
> the common base that we can use to generate the description/configuration of 
> a Samza job.
> This common layer can also be used by DSL-like language parser as a result of 
> parsing a DSL program.
> Some additional requirements needed in addition to pure relational algebra:
> 1) the common representation should include window operators and stream 
> operators (i.e. IStream/DStream/RStream)
> 2) the common representation should include description on parallelism of the 
> jobs (i.e. how many partitions the resultant Samza job will use)
> Some references:
> http://web.cs.wpi.edu/~mukherab/i/DCAPE.pdf
> https://cs.uwaterloo.ca/~david/cs848/stream-cql.pdf
> http://davis.wpi.edu/dsrg/PROJECTS/CAPE/publications.htm
> http://davis.wpi.edu/dsrg/PROJECTS/CAPE/slides.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-483) A common representation of relational algebra for streaming SQL

Reply via email to