Re: APEXMALHAR-1818 : Apex-Calcite Integration

Thomas Weise Wed, 05 Oct 2016 11:38:22 -0700

This is great news. Given the size of this PR, we should probably allow for
more time for review and feedback.



On Wed, Oct 5, 2016 at 4:56 AM, Chinmay Kolhatkar <chin...@datatorrent.com>
wrote:

> Dear Community,
>
> 
> On the review only PR (https://github.com/apache/apex-malhar/pull/432)
> that
> I created for first phase of calcite integration, I've received a good
> feedback for improvement from Julian (from Calcite PMC Chair), Tushar and
> Yogi. The comments are taken care of all.
>
> I think its time to remove "Review Only" for this PR and propose this PR
> for including in apex malhar.
>
> Please share your opinion.
>
> Thanks,
> Chinmay.
>
>
> On Wed, Sep 28, 2016 at 9:03 PM, Chinmay Kolhatkar <
> chin...@datatorrent.com>
> wrote:
>
> > Dear Community,
> >
> > I've created a review only PR for first phase of calcite integration.
> > https://github.com/apache/apex-malhar/pull/432
> >
> > Features implemented this PR are as follows:
> > 1. SELECT STATEMENT
> > 2. INSERT STATEMENT
> > 3. INNER JOIN with non-empty equi join condition
> > 4. WHERE clause
> > 5. SCALAR functions implemented in calcite are ready to use
> > 6. Custom scalar functions can be registered.
> > 7. Endpoint can be File OR Kafka OR Streaming Port for both input and
> > output
> > 8. CSV Data Format implemented for both input and output side.
> > 9. Static loading of calcite JDBC driver.
> > 10. Testing on local as well as cluster mode.
> >
> > I request everyone to please review the PR and provide feedback.
> >
> > Thanks,
> > Chinmay.
> >
> >
> > On Tue, Sep 20, 2016 at 11:16 AM, Chinmay Kolhatkar <
> > chin...@datatorrent.com> wrote:
> >
> >> Hi All,
> >>
> >> I wanted to give a quick update on Apex-Calcite integration work.
> >>
> >> Currently I'm able to run SQL statement as a DAG against registered
> table
> >> abstractions of data endpoint and message type.
> >>
> >> Here is the SQL support that is currently implemented:
> >> 1. Data Endpoint (Source/Destination):
> >>    - File
> >>    - Kafka
> >> 2. Message Types from Data endpoint (source/destination):
> >>    - CSV
> >> 3. SQL Functionality Support:
> >>    - SELECT (Projection) - Select from Source
> >>    - INSERT - Insert into Destination
> >>    - WHERE (Filter)
> >>    - Scalar functions which are provided in Calcite core
> >>    - Custom sclar function can be defined as provided to SQL.
> >> 4. Table can be defined as abstraction of Data Endpoint (source/dest)
> and
> >> message type
> >>
> >> Currently Calcite integration with Apex is exposed as a small boiler
> >> plate code in populateDAG as follows:
> >>
> >> SQLExecEnvironment.getEnvironment(dag)
> >>           .registerTable("ORDERS", new KafkaEndpoint(broker,
> sourceTopic,
> >> new CSVMessageFormat(schemaIn)))
> >>           .registerTable("SALES", new KafkaEndpoint(broker, destTopic,
> >> new CSVMessageFormat(schemaOut)))
> >>           .registerFunction("APEXCONCAT", FileEndpointTest.class,
> >> "apex_concat_str")
> >>           .executeSQL("INSERT INTO SALES " + "SELECT STREAM ROWTIME, " +
> >> "FLOOR(ROWTIME TO DAY), " +
> >>               "APEXCONCAT('OILPAINT', SUBSTRING(PRODUCT, 6, 7)) " +
> "FROM
> >> ORDERS WHERE ID > 3 " + "AND " +
> >>               "PRODUCT LIKE 'paint%'");
> >>
> >> Following is a video recording of the demo of apex-capcite integration:
> >> https://drive.google.com/open?id=0B_Tb-ZDtsUHeUVM5NWRYSFg0Z3c
> >>
> >> Currently I'm working on addition of inner join functionality.
> >> Once the inner join functionality is implemented, I think the code is
> >> good to create a Review Only PR for first cut of calcite integration.
> >>
> >> Please share your opinion on above.
> >>
> >> Thanks,
> >> Chinmay.
> >>
> >>
> >> On Fri, Aug 12, 2016 at 9:55 PM, Chinmay Kolhatkar <
> >> chin...@datatorrent.com> wrote:
> >>
> >>> Hi All,
> >>>
> >>> I wanted to give update on Apex-Calcite Integration work being done for
> >>> visibility and feedback from the community.
> >>>
> >>> In the first phase, target is to use Calcite core library for SQL
> >>> parsing and transformation of relation algebra to apex specific
> component
> >>> (operators).
> >>> Once this is achieved one would be able to define input and outputs
> >>> using Calcite model file and define the processing from input to output
> >>> using SQL statement.
> >>>
> >>> The status for above work as of now is as follows:
> >>> 1. I'm able to traverse relational algebra for simple select statement.
> >>> 2. DAG is getting generated for simple statement SELECT STREAM * FROM
> >>> TABLE.
> >>> 3. DAG is getting validated.
> >>> 4. Operators are being set with properties, streams and schema is also
> >>> being set using TUPLE_CLASS attr. For schema the class is generated on
> the
> >>> fly and put in classpath using LIBRARY_JAR attr.
> >>> 5. Able to run generated DAG in local mode.
> >>> 6. The code is currently being developed at (WIP):
> >>> Currently for each of development and code being farely large, I've
> >>> added a new module malhar-sql in malhar in my fork. But I'm open to
> other
> >>> suggestions here.
> >>> https://github.com/chinmaykolhatkar/apex-malhar/tree/calcite/sql
> >>>
> >>> Next step:
> >>> 1. Run the generate DAG in distributed mode.
> >>> 2. Expand the source and destination definition (calcite model file) to
> >>> include Kafka as source schema and destination.
> >>> 3. Expand the scope to include filter operator (WHERE clause, HAVING
> too
> >>> if possible) and inner join when it gets merged.
> >>> 4. Write extensive unit tests for above.
> >>>
> >>> I'll send an update on this thread at every logical step of achieving
> >>> something.
> >>>
> >>> I request the community to provide the feedback on above
> >>> approach/targets and if possible take a look at the code in above link.
> >>>
> >>> Thanks,
> >>> Chinmay.
> >>>
> >>>
> >>
> >
>

Re: APEXMALHAR-1818 : Apex-Calcite Integration

Reply via email to