Hi All, I wanted to give update on Apex-Calcite Integration work being done for visibility and feedback from the community.
In the first phase, target is to use Calcite core library for SQL parsing and transformation of relation algebra to apex specific component (operators). Once this is achieved one would be able to define input and outputs using Calcite model file and define the processing from input to output using SQL statement. The status for above work as of now is as follows: 1. I'm able to traverse relational algebra for simple select statement. 2. DAG is getting generated for simple statement SELECT STREAM * FROM TABLE. 3. DAG is getting validated. 4. Operators are being set with properties, streams and schema is also being set using TUPLE_CLASS attr. For schema the class is generated on the fly and put in classpath using LIBRARY_JAR attr. 5. Able to run generated DAG in local mode. 6. The code is currently being developed at (WIP): Currently for each of development and code being farely large, I've added a new module malhar-sql in malhar in my fork. But I'm open to other suggestions here. https://github.com/chinmaykolhatkar/apex-malhar/tree/calcite/sql Next step: 1. Run the generate DAG in distributed mode. 2. Expand the source and destination definition (calcite model file) to include Kafka as source schema and destination. 3. Expand the scope to include filter operator (WHERE clause, HAVING too if possible) and inner join when it gets merged. 4. Write extensive unit tests for above. I'll send an update on this thread at every logical step of achieving something. I request the community to provide the feedback on above approach/targets and if possible take a look at the code in above link. Thanks, Chinmay.