APEXMALHAR-1818 : Apex-Calcite Integration

Chinmay Kolhatkar Fri, 12 Aug 2016 09:26:08 -0700

Hi All,

I wanted to give update on Apex-Calcite Integration work being done for
visibility and feedback from the community.


In the first phase, target is to use Calcite core library for SQL parsing
and transformation of relation algebra to apex specific component
(operators).
Once this is achieved one would be able to define input and outputs using
Calcite model file and define the processing from input to output using SQL
statement.

The status for above work as of now is as follows:
1. I'm able to traverse relational algebra for simple select statement.
2. DAG is getting generated for simple statement SELECT STREAM * FROM TABLE.
3. DAG is getting validated.
4. Operators are being set with properties, streams and schema is also
being set using TUPLE_CLASS attr. For schema the class is generated on the
fly and put in classpath using LIBRARY_JAR attr.
5. Able to run generated DAG in local mode.
6. The code is currently being developed at (WIP):
Currently for each of development and code being farely large, I've added a
new module malhar-sql in malhar in my fork. But I'm open to other
suggestions here.
https://github.com/chinmaykolhatkar/apex-malhar/tree/calcite/sql

Next step:
1. Run the generate DAG in distributed mode.
2. Expand the source and destination definition (calcite model file) to
include Kafka as source schema and destination.
3. Expand the scope to include filter operator (WHERE clause, HAVING too if
possible) and inner join when it gets merged.
4. Write extensive unit tests for above.

I'll send an update on this thread at every logical step of achieving
something.

I request the community to provide the feedback on above approach/targets
and if possible take a look at the code in above link.

Thanks,
Chinmay.

APEXMALHAR-1818 : Apex-Calcite Integration

Reply via email to