Re: Calcite with Phoenix and Spark

Eli Levine Wed, 02 Nov 2016 15:23:06 -0700

Will follow your suggested model when I start development. Thanks for
offering to potentially include that work in Calcite, Julian.


Eli


On Tue, Nov 1, 2016 at 11:50 AM, Julian Hyde <[email protected]> wrote:

> If it helps make your “hope” a bit more likely to happen, you should
> consider doing your Spark or Pig adapters in the Calcite code base, that
> is, as a fork of the Calcite repo on GitHub from which you periodically
> submit pull requests.  I would welcome that development model. For big,
> important features like this, I am comfortable including alpha or beta
> quality code in the Calcite release.
>
> If you do the work as part of the Calcite project, almost certainly other
> developers will want to help out. You’ll do less work  yourself, and end up
> with a more robust result.
>
> I am Cc:ing Daniel Dai. He and I have talked about a Pig adapter for
> Calcite in the past. If you decide to go that route I’m Daniel may be able
> to help out.
>
> Julian
>
> > On Nov 1, 2016, at 11:35 AM, Eli Levine <[email protected]> wrote:
> >
> > Thank you for the pointers, Julian and James! I have a requirement that
> the
> > main execution engine is a fault-tolerant one and at this point the main
> > contenders are Pig and Spark. Drillx is great as a source of example
> usages
> > of Calcite, so it will definitely be useful.
> >
> > And yes, the hope is to contribute any Spark and/or Pig adapter code that
> > gets developed to Calcite.
> >
> > Eli
> >
> >
> > On Sat, Oct 22, 2016 at 9:56 PM, Julian Hyde <[email protected]> wrote:
> >
> >> Well, to correct James slightly, there is SOME support for Spark in
> >> Calcite, but it’s fair to say that it hasn’t had much love. If you would
> >> like to get something working then Drillix (Drill + Phoenix + Calcite)
> is
> >> the way to go.
> >>
> >> That said, Spark is an excellent and hugely popular execution
> environment,
> >> so I would very much like to improve the Spark adapter. A few people on
> >> this list have talked about that over the past couple of months. If you
> >> would like to join that effort, it would be most welcome, but there’s
> more
> >> work to be done before you start getting results.
> >>
> >> Julian
> >>
> >>
> >>> On Oct 22, 2016, at 4:41 PM, James Taylor <[email protected]>
> >> wrote:
> >>>
> >>> Hi Eli,
> >>> With the calcite branch of Phoenix you're part way there. I think a
> good
> >>> way to approach this would be to create a new set of operators that
> >>> correspond to Spark operations and the corresponding rules that know
> when
> >>> to use them. These could then be costed with the other Phoenix
> operators
> >> at
> >>> planning time. Spark would work especially well to store intermediate
> >>> results in more complex queries.
> >>>
> >>> Since Spark doesn't integrate natively with Calcite, I think using
> Spark
> >>> directly may not get you where you need to go. In the same way, the
> >>> Phoenix-Spark integration is higher level, built on top of Phoenix and
> >> has
> >>> no direct integration with Calcite.
> >>>
> >>> Another alternative to consider would be using Drillix (Drill +
> Phoenix)
> >>> which uses Calcite underneath[1].
> >>>
> >>> Thanks,
> >>> James
> >>>
> >>> [1]
> >>> https://apurtell.s3.amazonaws.com/phoenix/Drillix+Combined+
> >> Operational+%26+Analytical+SQL+at+Scale.pdf
> >>>
> >>> On Sat, Oct 22, 2016 at 1:02 PM, Eli Levine <[email protected]>
> wrote:
> >>>
> >>>> Greetings, Calcite devs. First of all, thank you for your work on
> >> Calcite!
> >>>>
> >>>> I am working on a federated query engine that will use Spark (or
> >> something
> >>>> similar) as the main execution engine. Among other data sources the
> >> query
> >>>> engine will read from Apache Phoenix tables/views. The hope is to
> >> utilize
> >>>> Calcite as the query planner and optimizer component of this query
> >> engine.
> >>>>
> >>>> At a high level, I am trying to build the following using Calcite:
> >>>> 1. Generate a relational algebra expression tree using RelBuilder
> based
> >> on
> >>>> user input. I plan to implement custom schema and table classes based
> >> on my
> >>>> metadata.
> >>>> 2. Provide Calcite with query optimization rules.
> >>>> 3. Traverse the optimized expression tree to generate a set of Spark
> >>>> instructions.
> >>>> 4. Execute query instructions via Spark.
> >>>>
> >>>> A few questions regarding the above:
> >>>> 1. Are there existing examples of code that does #3 above? I looked at
> >> the
> >>>> Spark submodule and it seems pretty bare-bones. What would be great to
> >> see
> >>>> is an example of a RelNode tree being traversed to create a plan for
> >>>> asynchronous execution via something like Spark or Pig.
> >>>> 2. An important query optimization that is planned initially is to be
> >> able
> >>>> to push down simple filters to Phoenix (the plan is to use
> Phoenix-Spark
> >>>> <http://phoenix.apache.org/phoenix_spark.html> integration for
> reading
> >>>> data). Any examples of such push-downs to specific data sources in a
> >>>> federated query scenario would be much appreciated.
> >>>>
> >>>> Thank you! Looking forward to working with the Calcite community.
> >>>>
> >>>> -------------
> >>>> Eli Levine
> >>>> Software Engineering Architect -- Salesforce.com
> >>>>
> >>
> >>
>
>

Re: Calcite with Phoenix and Spark

Reply via email to