[ 
https://issues.apache.org/jira/browse/SAMZA-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275998#comment-14275998
 ] 

Milinda Lakmal Pathirage commented on SAMZA-483:
------------------------------------------------

Here is a another try to rephrase the above comment by me.

Let's say for Samza Streaming SQL, user will start with a streaming query like 
below entered to a REPL:

SELECT ISTREAM field1, count(*) FROM InputStream1
    
    WHERE someField >= 3 && someField <= 6
    GROUP BY field1
    INTO OutputStream1;

Process of converting this to a set of Samza jobs would be:

1. Parse the query (ANTLR or similar tool; Generates AST)
2. Semantic analysis (We implement semantic analysis phase on top of AST or 
some other model generated from AST)
3. Optimizations
4. Generate execution plan (Samza job)

Given that we are going with CQL based execution model, execution plan would be 
several extended relational algebra expressions connected together depending on 
the query.

So considering above, where is this common representation going to sit? Best 
model for our case will depend on answer to this question.

Others may have a different view than this. So, please feel free to comment 
with those views.

> A common representation of relational algebra for streaming SQL 
> ----------------------------------------------------------------
>
>                 Key: SAMZA-483
>                 URL: https://issues.apache.org/jira/browse/SAMZA-483
>             Project: Samza
>          Issue Type: Sub-task
>            Reporter: Yi Pan (Data Infrastructure)
>            Priority: Minor
>              Labels: project
>
> Per discussion with [~criccomini] and [~milinda], we agreed that it seems to 
> be a good idea to define a common representation of relational algebra on top 
> of the operators defined in the operator layer (see SAMZA-482), which can be 
> the common base that we can use to generate the description/configuration of 
> a Samza job.
> This common layer can also be used by DSL-like language parser as a result of 
> parsing a DSL program.
> Some additional requirements needed in addition to pure relational algebra:
> 1) the common representation should include window operators and stream 
> operators (i.e. IStream/DStream/RStream)
> 2) the common representation should include description on parallelism of the 
> jobs (i.e. how many partitions the resultant Samza job will use)
> Some references:
> http://web.cs.wpi.edu/~mukherab/i/DCAPE.pdf
> https://cs.uwaterloo.ca/~david/cs848/stream-cql.pdf
> http://davis.wpi.edu/dsrg/PROJECTS/CAPE/publications.htm
> http://davis.wpi.edu/dsrg/PROJECTS/CAPE/slides.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to