[jira] [Commented] (SAMZA-483) A common representation of relational algebra for streaming SQL

Yi Pan (Data Infrastructure) (JIRA) Thu, 15 Jan 2015 11:40:15 -0800

    [ 
https://issues.apache.org/jira/browse/SAMZA-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279190#comment-14279190
 ]


Yi Pan (Data Infrastructure) commented on SAMZA-483:
----------------------------------------------------

>From the discussion w/ [~criccomini], I finally seems to get the exact meaning 
>of front-end and back-end in [~milinda]'s description. I personal 
>interpretation on that is:
# front-end REPL would be something that is closer to the language itself, in 
terms of syntax and semantics
# back-end REPL would be something that is closer to the actual representation 
of the operators/execution paths in SAMZA jobs, less coupled with language 
details
Based on the above understanding, my opinions are:
{quote}
What is the output of this phase? Is this where either a relational algebra 
model vs. sql object model comes into play?
{quote}
My understanding here: the semantic analysis phase would not change the format 
of the front-end REPL, just validate the semantics of the front-end REPL, 
possibly an AST or an extended Relational Algebra OM.

{quote}
4. We write code that translates from OM into a set of Samza job configs, all 
of which have task.class=org.apache.samza.sql.SqlTask.
{quote}
In this phase, we should translate the front-end REPL to a back-end REPL for 
the whole SQL query. As [~criccomini] mentioned, the back-end REPL could 
possibly just be a) a set of operator specs; b) the routing context among the 
operator specs; And c) store / stream specs between the operators. Then, the 
planner can divide the whole query plan into sub-graphs corresponding to each 
deployed task following the system stream I/O boundaries, and just send this 
sub-graph back-end REPL as config to each task. Then, when the container starts 
each task, it can directly instantiate the operators and the routing context 
within the task from the sub-graph back-end REPL as the config and start the 
task. Hope this description also answered the following questions:
{quote}
where does the "spec" pattern fit into this flow?
{quote}
{quote}
When this happens, you need a way to programmatically instantiate the 
operators, and I think this is where the specs/factories come in, right?
{quote}

> A common representation of relational algebra for streaming SQL 
> ----------------------------------------------------------------
>
>                 Key: SAMZA-483
>                 URL: https://issues.apache.org/jira/browse/SAMZA-483
>             Project: Samza
>          Issue Type: Sub-task
>          Components: sql
>            Reporter: Yi Pan (Data Infrastructure)
>            Priority: Minor
>              Labels: project
>
> Per discussion with [~criccomini] and [~milinda], we agreed that it seems to 
> be a good idea to define a common representation of relational algebra on top 
> of the operators defined in the operator layer (see SAMZA-482), which can be 
> the common base that we can use to generate the description/configuration of 
> a Samza job.
> This common layer can also be used by DSL-like language parser as a result of 
> parsing a DSL program.
> Some additional requirements needed in addition to pure relational algebra:
> 1) the common representation should include window operators and stream 
> operators (i.e. IStream/DStream/RStream)
> 2) the common representation should include description on parallelism of the 
> jobs (i.e. how many partitions the resultant Samza job will use)
> Some references:
> http://web.cs.wpi.edu/~mukherab/i/DCAPE.pdf
> https://cs.uwaterloo.ca/~david/cs848/stream-cql.pdf
> http://davis.wpi.edu/dsrg/PROJECTS/CAPE/publications.htm
> http://davis.wpi.edu/dsrg/PROJECTS/CAPE/slides.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-483) A common representation of relational algebra for streaming SQL

Reply via email to