Re: Calcite with Phoenix and Spark

Eli Levine Wed, 02 Nov 2016 15:15:54 -0700

It's a fairly loose term. For us it generally means being able to recover
from node failures without having to rerun the process from the beginning.
M/R, Spark fall broadly into that category.


Thanks,

Eli


On Tue, Nov 1, 2016 at 11:46 AM, James Taylor <[email protected]>
wrote:

> Eli,
> Can you define what you mean by "fault-tolerant"? Phoenix+HBase are fault
> tolerant through the retries that HBase does.
> Thanks,
> James
>
> On Tue, Nov 1, 2016 at 11:35 AM, Eli Levine <[email protected]> wrote:
>
> > Thank you for the pointers, Julian and James! I have a requirement that
> the
> > main execution engine is a fault-tolerant one and at this point the main
> > contenders are Pig and Spark. Drillx is great as a source of example
> usages
> > of Calcite, so it will definitely be useful.
> >
> > And yes, the hope is to contribute any Spark and/or Pig adapter code that
> > gets developed to Calcite.
> >
> > Eli
> >
> >
> > On Sat, Oct 22, 2016 at 9:56 PM, Julian Hyde <[email protected]> wrote:
> >
> > > Well, to correct James slightly, there is SOME support for Spark in
> > > Calcite, but it’s fair to say that it hasn’t had much love. If you
> would
> > > like to get something working then Drillix (Drill + Phoenix + Calcite)
> is
> > > the way to go.
> > >
> > > That said, Spark is an excellent and hugely popular execution
> > environment,
> > > so I would very much like to improve the Spark adapter. A few people on
> > > this list have talked about that over the past couple of months. If you
> > > would like to join that effort, it would be most welcome, but there’s
> > more
> > > work to be done before you start getting results.
> > >
> > > Julian
> > >
> > >
> > > > On Oct 22, 2016, at 4:41 PM, James Taylor <[email protected]>
> > > wrote:
> > > >
> > > > Hi Eli,
> > > > With the calcite branch of Phoenix you're part way there. I think a
> > good
> > > > way to approach this would be to create a new set of operators that
> > > > correspond to Spark operations and the corresponding rules that know
> > when
> > > > to use them. These could then be costed with the other Phoenix
> > operators
> > > at
> > > > planning time. Spark would work especially well to store intermediate
> > > > results in more complex queries.
> > > >
> > > > Since Spark doesn't integrate natively with Calcite, I think using
> > Spark
> > > > directly may not get you where you need to go. In the same way, the
> > > > Phoenix-Spark integration is higher level, built on top of Phoenix
> and
> > > has
> > > > no direct integration with Calcite.
> > > >
> > > > Another alternative to consider would be using Drillix (Drill +
> > Phoenix)
> > > > which uses Calcite underneath[1].
> > > >
> > > > Thanks,
> > > > James
> > > >
> > > > [1]
> > > > https://apurtell.s3.amazonaws.com/phoenix/Drillix+Combined+
> > > Operational+%26+Analytical+SQL+at+Scale.pdf
> > > >
> > > > On Sat, Oct 22, 2016 at 1:02 PM, Eli Levine <[email protected]>
> > wrote:
> > > >
> > > >> Greetings, Calcite devs. First of all, thank you for your work on
> > > Calcite!
> > > >>
> > > >> I am working on a federated query engine that will use Spark (or
> > > something
> > > >> similar) as the main execution engine. Among other data sources the
> > > query
> > > >> engine will read from Apache Phoenix tables/views. The hope is to
> > > utilize
> > > >> Calcite as the query planner and optimizer component of this query
> > > engine.
> > > >>
> > > >> At a high level, I am trying to build the following using Calcite:
> > > >> 1. Generate a relational algebra expression tree using RelBuilder
> > based
> > > on
> > > >> user input. I plan to implement custom schema and table classes
> based
> > > on my
> > > >> metadata.
> > > >> 2. Provide Calcite with query optimization rules.
> > > >> 3. Traverse the optimized expression tree to generate a set of Spark
> > > >> instructions.
> > > >> 4. Execute query instructions via Spark.
> > > >>
> > > >> A few questions regarding the above:
> > > >> 1. Are there existing examples of code that does #3 above? I looked
> at
> > > the
> > > >> Spark submodule and it seems pretty bare-bones. What would be great
> to
> > > see
> > > >> is an example of a RelNode tree being traversed to create a plan for
> > > >> asynchronous execution via something like Spark or Pig.
> > > >> 2. An important query optimization that is planned initially is to
> be
> > > able
> > > >> to push down simple filters to Phoenix (the plan is to use
> > Phoenix-Spark
> > > >> <http://phoenix.apache.org/phoenix_spark.html> integration for
> > reading
> > > >> data). Any examples of such push-downs to specific data sources in a
> > > >> federated query scenario would be much appreciated.
> > > >>
> > > >> Thank you! Looking forward to working with the Calcite community.
> > > >>
> > > >> -------------
> > > >> Eli Levine
> > > >> Software Engineering Architect -- Salesforce.com
> > > >>
> > >
> > >
> >
>

Re: Calcite with Phoenix and Spark

Reply via email to