Re: Calcite with Phoenix and Spark

Eli Levine Tue, 01 Nov 2016 11:36:36 -0700

Thank you for the pointers, Julian and James! I have a requirement that the
main execution engine is a fault-tolerant one and at this point the main
contenders are Pig and Spark. Drillx is great as a source of example usages
of Calcite, so it will definitely be useful.


And yes, the hope is to contribute any Spark and/or Pig adapter code that
gets developed to Calcite.

Eli


On Sat, Oct 22, 2016 at 9:56 PM, Julian Hyde <[email protected]> wrote:

> Well, to correct James slightly, there is SOME support for Spark in
> Calcite, but it’s fair to say that it hasn’t had much love. If you would
> like to get something working then Drillix (Drill + Phoenix + Calcite) is
> the way to go.
>
> That said, Spark is an excellent and hugely popular execution environment,
> so I would very much like to improve the Spark adapter. A few people on
> this list have talked about that over the past couple of months. If you
> would like to join that effort, it would be most welcome, but there’s more
> work to be done before you start getting results.
>
> Julian
>
>
> > On Oct 22, 2016, at 4:41 PM, James Taylor <[email protected]>
> wrote:
> >
> > Hi Eli,
> > With the calcite branch of Phoenix you're part way there. I think a good
> > way to approach this would be to create a new set of operators that
> > correspond to Spark operations and the corresponding rules that know when
> > to use them. These could then be costed with the other Phoenix operators
> at
> > planning time. Spark would work especially well to store intermediate
> > results in more complex queries.
> >
> > Since Spark doesn't integrate natively with Calcite, I think using Spark
> > directly may not get you where you need to go. In the same way, the
> > Phoenix-Spark integration is higher level, built on top of Phoenix and
> has
> > no direct integration with Calcite.
> >
> > Another alternative to consider would be using Drillix (Drill + Phoenix)
> > which uses Calcite underneath[1].
> >
> > Thanks,
> > James
> >
> > [1]
> > https://apurtell.s3.amazonaws.com/phoenix/Drillix+Combined+
> Operational+%26+Analytical+SQL+at+Scale.pdf
> >
> > On Sat, Oct 22, 2016 at 1:02 PM, Eli Levine <[email protected]> wrote:
> >
> >> Greetings, Calcite devs. First of all, thank you for your work on
> Calcite!
> >>
> >> I am working on a federated query engine that will use Spark (or
> something
> >> similar) as the main execution engine. Among other data sources the
> query
> >> engine will read from Apache Phoenix tables/views. The hope is to
> utilize
> >> Calcite as the query planner and optimizer component of this query
> engine.
> >>
> >> At a high level, I am trying to build the following using Calcite:
> >> 1. Generate a relational algebra expression tree using RelBuilder based
> on
> >> user input. I plan to implement custom schema and table classes based
> on my
> >> metadata.
> >> 2. Provide Calcite with query optimization rules.
> >> 3. Traverse the optimized expression tree to generate a set of Spark
> >> instructions.
> >> 4. Execute query instructions via Spark.
> >>
> >> A few questions regarding the above:
> >> 1. Are there existing examples of code that does #3 above? I looked at
> the
> >> Spark submodule and it seems pretty bare-bones. What would be great to
> see
> >> is an example of a RelNode tree being traversed to create a plan for
> >> asynchronous execution via something like Spark or Pig.
> >> 2. An important query optimization that is planned initially is to be
> able
> >> to push down simple filters to Phoenix (the plan is to use Phoenix-Spark
> >> <http://phoenix.apache.org/phoenix_spark.html> integration for reading
> >> data). Any examples of such push-downs to specific data sources in a
> >> federated query scenario would be much appreciated.
> >>
> >> Thank you! Looking forward to working with the Calcite community.
> >>
> >> -------------
> >> Eli Levine
> >> Software Engineering Architect -- Salesforce.com
> >>
>
>

Re: Calcite with Phoenix and Spark

Reply via email to