[
https://issues.apache.org/jira/browse/SAMZA-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Milinda Lakmal Pathirage updated SAMZA-483:
-------------------------------------------
Attachment: calcite-integration-prototype.patch
Here is a pretty simple prototype which I created to evaluate Calcite
integration. This prototype burrows lot of code from Calcite and use fake
context classes and interface implementations to get the job done. I have
integrated most of the query planning steps found in Calcite (flatten types and
sub query de-correlation are missing). I first tried to extend Calcite JDBC
implementation to implement Samza specific JDBC driver. After figuring out
that, it requires a lot of effort and deep understanding about Calcite JDBC
implementation, I decided to implement a simple prototype which can use as a
starting point to integrate Calcite based query parsing and planning to Samza.
IMHO, we should first implement Calcite query plan to Samza job conversion and
then go for JDBC implementation. Also, there is a possibility that we don't
really need to implement JDBC interfaces based on Calcite's JDBC
implementation, but our own simple JDBC implementation.
> A common representation of relational algebra for streaming SQL
> ----------------------------------------------------------------
>
> Key: SAMZA-483
> URL: https://issues.apache.org/jira/browse/SAMZA-483
> Project: Samza
> Issue Type: Sub-task
> Components: sql
> Reporter: Yi Pan (Data Infrastructure)
> Priority: Minor
> Labels: project
> Attachments: calcite-integration-prototype.patch
>
>
> Per discussion with [~criccomini] and [~milinda], we agreed that it seems to
> be a good idea to define a common representation of relational algebra on top
> of the operators defined in the operator layer (see SAMZA-482), which can be
> the common base that we can use to generate the description/configuration of
> a Samza job.
> This common layer can also be used by DSL-like language parser as a result of
> parsing a DSL program.
> Some additional requirements needed in addition to pure relational algebra:
> 1) the common representation should include window operators and stream
> operators (i.e. IStream/DStream/RStream)
> 2) the common representation should include description on parallelism of the
> jobs (i.e. how many partitions the resultant Samza job will use)
> Some references:
> http://web.cs.wpi.edu/~mukherab/i/DCAPE.pdf
> https://cs.uwaterloo.ca/~david/cs848/stream-cql.pdf
> http://davis.wpi.edu/dsrg/PROJECTS/CAPE/publications.htm
> http://davis.wpi.edu/dsrg/PROJECTS/CAPE/slides.htm
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)