[
https://issues.apache.org/jira/browse/SAMZA-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279190#comment-14279190
]
Yi Pan (Data Infrastructure) commented on SAMZA-483:
----------------------------------------------------
>From the discussion w/ [~criccomini], I finally seems to get the exact meaning
>of front-end and back-end in [~milinda]'s description. I personal
>interpretation on that is:
# front-end REPL would be something that is closer to the language itself, in
terms of syntax and semantics
# back-end REPL would be something that is closer to the actual representation
of the operators/execution paths in SAMZA jobs, less coupled with language
details
Based on the above understanding, my opinions are:
{quote}
What is the output of this phase? Is this where either a relational algebra
model vs. sql object model comes into play?
{quote}
My understanding here: the semantic analysis phase would not change the format
of the front-end REPL, just validate the semantics of the front-end REPL,
possibly an AST or an extended Relational Algebra OM.
{quote}
4. We write code that translates from OM into a set of Samza job configs, all
of which have task.class=org.apache.samza.sql.SqlTask.
{quote}
In this phase, we should translate the front-end REPL to a back-end REPL for
the whole SQL query. As [~criccomini] mentioned, the back-end REPL could
possibly just be a) a set of operator specs; b) the routing context among the
operator specs; And c) store / stream specs between the operators. Then, the
planner can divide the whole query plan into sub-graphs corresponding to each
deployed task following the system stream I/O boundaries, and just send this
sub-graph back-end REPL as config to each task. Then, when the container starts
each task, it can directly instantiate the operators and the routing context
within the task from the sub-graph back-end REPL as the config and start the
task. Hope this description also answered the following questions:
{quote}
where does the "spec" pattern fit into this flow?
{quote}
{quote}
When this happens, you need a way to programmatically instantiate the
operators, and I think this is where the specs/factories come in, right?
{quote}
> A common representation of relational algebra for streaming SQL
> ----------------------------------------------------------------
>
> Key: SAMZA-483
> URL: https://issues.apache.org/jira/browse/SAMZA-483
> Project: Samza
> Issue Type: Sub-task
> Components: sql
> Reporter: Yi Pan (Data Infrastructure)
> Priority: Minor
> Labels: project
>
> Per discussion with [~criccomini] and [~milinda], we agreed that it seems to
> be a good idea to define a common representation of relational algebra on top
> of the operators defined in the operator layer (see SAMZA-482), which can be
> the common base that we can use to generate the description/configuration of
> a Samza job.
> This common layer can also be used by DSL-like language parser as a result of
> parsing a DSL program.
> Some additional requirements needed in addition to pure relational algebra:
> 1) the common representation should include window operators and stream
> operators (i.e. IStream/DStream/RStream)
> 2) the common representation should include description on parallelism of the
> jobs (i.e. how many partitions the resultant Samza job will use)
> Some references:
> http://web.cs.wpi.edu/~mukherab/i/DCAPE.pdf
> https://cs.uwaterloo.ca/~david/cs848/stream-cql.pdf
> http://davis.wpi.edu/dsrg/PROJECTS/CAPE/publications.htm
> http://davis.wpi.edu/dsrg/PROJECTS/CAPE/slides.htm
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)