Re: how about integrate spark dataset/dataframe api

Jacky Li Mon, 05 Sep 2016 06:41:00 -0700

Hi Julian,

By translating the calcite SQL into spark dataframe/dataset API, the benefit I 
see is that it provides a unified SQL layer for computation framework like 
spark and flink, so that user writes their SQL statement and be executed by any 
computation framework whose API can be translated from calcite’s logical plan.


Another potential benefit is that it can enable some optimization that require 
across-tables/view manipulations. Actually in CarbonData community we are 
evaluating this approach for roadmap features.  

I want to understand more on what should be done to translate calcite logical 
plan to dataframe/dataset API, and if I understand flink Table API correctly, 
what we need is something similar to package 
org.apache.flink.api.table.plan.nodes.dataset (extension of RelNode and 
corresponding translation). Am I correct?


Regards,
Jacky

> 在 2016年9月5日，上午10:38，Wangfei (X) <[email protected]> 写道：
> 
> IMO the main benefit is to inherit the optimization of spark SQL(such as 
> whole stage codegen, memory management, maybe vectorized execution in future 
> ...).
> 
> not farmilar with calcite's codegen mechanism, any reference about it?  I 
> think firstly i will unsterstand how the spark adapter now works and then see 
> what i can do .
> 
> Fei
>> 
>>    *From:* Julian Hyde <mailto:[email protected]>
>>    *Date:* 2016-09-05 05:34
>>    *To:* [email protected] <mailto:[email protected]>
>>    *Subject:* Re: how about integrate spark dataset/dataframe api
>> 
>>    It’s an interesting idea. I know that the data frame API is easier
>>    to work with for application developers, but since Calcite would
>>    be generating the code, can you describe the benefits to the
>>    Calcite user of changing the integration point?
>> 
>>    It’s definitely true that Calcite’s Spark adapter needs some love.
>>    If someone would like to rework the adapter in terms of the data
>>    frame API and get it working on more cases, and more reliably, I
>>    would definitely welcome it.
>> 
>>    Julian
>> 
>> 
>>    > On Sep 1, 2016, at 8:35 PM, Wangfei (X) <[email protected]> wrote:
>>    >
>>    > Hi, community
>>    >      I noticed that now the spark adapter in calcite is
>>    integrated with spark core api, since now the dataset/dataframe
>>    api become the top level api, how about integrate the
>>    dataset/dataframe api ? or is it possible to do that?
>>    >
>>    > Fei.
>>    >
>> 
>

Re: how about integrate spark dataset/dataframe api

Reply via email to