Well, to correct James slightly, there is SOME support for Spark in Calcite, 
but it’s fair to say that it hasn’t had much love. If you would like to get 
something working then Drillix (Drill + Phoenix + Calcite) is the way to go.

That said, Spark is an excellent and hugely popular execution environment, so I 
would very much like to improve the Spark adapter. A few people on this list 
have talked about that over the past couple of months. If you would like to 
join that effort, it would be most welcome, but there’s more work to be done 
before you start getting results.

Julian


> On Oct 22, 2016, at 4:41 PM, James Taylor <[email protected]> wrote:
> 
> Hi Eli,
> With the calcite branch of Phoenix you're part way there. I think a good
> way to approach this would be to create a new set of operators that
> correspond to Spark operations and the corresponding rules that know when
> to use them. These could then be costed with the other Phoenix operators at
> planning time. Spark would work especially well to store intermediate
> results in more complex queries.
> 
> Since Spark doesn't integrate natively with Calcite, I think using Spark
> directly may not get you where you need to go. In the same way, the
> Phoenix-Spark integration is higher level, built on top of Phoenix and has
> no direct integration with Calcite.
> 
> Another alternative to consider would be using Drillix (Drill + Phoenix)
> which uses Calcite underneath[1].
> 
> Thanks,
> James
> 
> [1]
> https://apurtell.s3.amazonaws.com/phoenix/Drillix+Combined+Operational+%26+Analytical+SQL+at+Scale.pdf
> 
> On Sat, Oct 22, 2016 at 1:02 PM, Eli Levine <[email protected]> wrote:
> 
>> Greetings, Calcite devs. First of all, thank you for your work on Calcite!
>> 
>> I am working on a federated query engine that will use Spark (or something
>> similar) as the main execution engine. Among other data sources the query
>> engine will read from Apache Phoenix tables/views. The hope is to utilize
>> Calcite as the query planner and optimizer component of this query engine.
>> 
>> At a high level, I am trying to build the following using Calcite:
>> 1. Generate a relational algebra expression tree using RelBuilder based on
>> user input. I plan to implement custom schema and table classes based on my
>> metadata.
>> 2. Provide Calcite with query optimization rules.
>> 3. Traverse the optimized expression tree to generate a set of Spark
>> instructions.
>> 4. Execute query instructions via Spark.
>> 
>> A few questions regarding the above:
>> 1. Are there existing examples of code that does #3 above? I looked at the
>> Spark submodule and it seems pretty bare-bones. What would be great to see
>> is an example of a RelNode tree being traversed to create a plan for
>> asynchronous execution via something like Spark or Pig.
>> 2. An important query optimization that is planned initially is to be able
>> to push down simple filters to Phoenix (the plan is to use Phoenix-Spark
>> <http://phoenix.apache.org/phoenix_spark.html> integration for reading
>> data). Any examples of such push-downs to specific data sources in a
>> federated query scenario would be much appreciated.
>> 
>> Thank you! Looking forward to working with the Calcite community.
>> 
>> -------------
>> Eli Levine
>> Software Engineering Architect -- Salesforce.com
>> 

Reply via email to