[
https://issues.apache.org/jira/browse/SAMZA-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277822#comment-14277822
]
Chris Riccomini commented on SAMZA-483:
---------------------------------------
bq. 2. Semantic analysis (We implement semantic analysis phase on top of AST or
some other model generated from AST)
What is the output of this phase? Is this where either a relational algebra
model vs. sql object model comes into play?
bq. Optimization can be optional for now.
+1
bq. I am proposing that we just focus on a common representation layer that can
represent the execution plan in 4
As I understand it, the current patch has two main patterns: operators and
specs. Would the operators be step (4), and the specs would be the output of
step (2)? Something needs to translate between the results of the semantic
analaysis and the specs. If that thing does the translation, can we just forgo
the specs all together, and have something that takes the results of the
semantic analysis and directly instantiates the operators?
A general comment: perhaps it would be constructive to show a full example flow
from REPL to operators? I think that there might be two phases to the
execution. The first phase is done from the driver machine, when the user
executes a command from a REPL.
# User enters SELECT * FROM MyStream WHERE foo = bar;
# ANTLR parses grammar and generates an AST.
# We write code that translates the AST into either a SQL OM or Relational
Algebra OM.
# We write code that translates from OM into a set of Samza job configs, all of
which have task.class=org.apache.samza.sql.SqlTask.
# We write code that starts the set of jobs.
The second phase occurs on in each container of each job that was started for
the query.
# SqlTask receives configs, which amount to a machine-compiled query. This
could be the OM or a SQL query fragment string (e.g. SELECT * FROM MyStream).
# If the machine-compiled query is just a SQL query fragment, the SQL task
again uses ANTLR to parse the grammar and generate an AST, and then converts
the AST to an OM.
# SqlTask converts OM to a set of operators that it can wire together and
execute.
Is my mental model correct? If so, it sounds like [~milinda] is discussing what
kind of object model in step (3) of the first phase. [~nickpan47], where does
the "spec" pattern fit into this flow? I think it would be in step (3) of phase
2, when the SqlTask has to actually convert the object model to the operators.
When this happens, you need a way to programmatically instantiate the
operators, and I think this is where the specs/factories come in, right?
> A common representation of relational algebra for streaming SQL
> ----------------------------------------------------------------
>
> Key: SAMZA-483
> URL: https://issues.apache.org/jira/browse/SAMZA-483
> Project: Samza
> Issue Type: Sub-task
> Components: sql
> Reporter: Yi Pan (Data Infrastructure)
> Priority: Minor
> Labels: project
>
> Per discussion with [~criccomini] and [~milinda], we agreed that it seems to
> be a good idea to define a common representation of relational algebra on top
> of the operators defined in the operator layer (see SAMZA-482), which can be
> the common base that we can use to generate the description/configuration of
> a Samza job.
> This common layer can also be used by DSL-like language parser as a result of
> parsing a DSL program.
> Some additional requirements needed in addition to pure relational algebra:
> 1) the common representation should include window operators and stream
> operators (i.e. IStream/DStream/RStream)
> 2) the common representation should include description on parallelism of the
> jobs (i.e. how many partitions the resultant Samza job will use)
> Some references:
> http://web.cs.wpi.edu/~mukherab/i/DCAPE.pdf
> https://cs.uwaterloo.ca/~david/cs848/stream-cql.pdf
> http://davis.wpi.edu/dsrg/PROJECTS/CAPE/publications.htm
> http://davis.wpi.edu/dsrg/PROJECTS/CAPE/slides.htm
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)