Re: Calcite with Phoenix and Spark

James Taylor Tue, 01 Nov 2016 11:47:23 -0700

Eli,
Can you define what you mean by "fault-tolerant"? Phoenix+HBase are fault
tolerant through the retries that HBase does.
Thanks,
James


On Tue, Nov 1, 2016 at 11:35 AM, Eli Levine <[email protected]> wrote:

> Thank you for the pointers, Julian and James! I have a requirement that the
> main execution engine is a fault-tolerant one and at this point the main
> contenders are Pig and Spark. Drillx is great as a source of example usages
> of Calcite, so it will definitely be useful.
>
> And yes, the hope is to contribute any Spark and/or Pig adapter code that
> gets developed to Calcite.
>
> Eli
>
>
> On Sat, Oct 22, 2016 at 9:56 PM, Julian Hyde <[email protected]> wrote:
>
> > Well, to correct James slightly, there is SOME support for Spark in
> > Calcite, but it’s fair to say that it hasn’t had much love. If you would
> > like to get something working then Drillix (Drill + Phoenix + Calcite) is
> > the way to go.
> >
> > That said, Spark is an excellent and hugely popular execution
> environment,
> > so I would very much like to improve the Spark adapter. A few people on
> > this list have talked about that over the past couple of months. If you
> > would like to join that effort, it would be most welcome, but there’s
> more
> > work to be done before you start getting results.
> >
> > Julian
> >
> >
> > > On Oct 22, 2016, at 4:41 PM, James Taylor <[email protected]>
> > wrote:
> > >
> > > Hi Eli,
> > > With the calcite branch of Phoenix you're part way there. I think a
> good
> > > way to approach this would be to create a new set of operators that
> > > correspond to Spark operations and the corresponding rules that know
> when
> > > to use them. These could then be costed with the other Phoenix
> operators
> > at
> > > planning time. Spark would work especially well to store intermediate
> > > results in more complex queries.
> > >
> > > Since Spark doesn't integrate natively with Calcite, I think using
> Spark
> > > directly may not get you where you need to go. In the same way, the
> > > Phoenix-Spark integration is higher level, built on top of Phoenix and
> > has
> > > no direct integration with Calcite.
> > >
> > > Another alternative to consider would be using Drillix (Drill +
> Phoenix)
> > > which uses Calcite underneath[1].
> > >
> > > Thanks,
> > > James
> > >
> > > [1]
> > > https://apurtell.s3.amazonaws.com/phoenix/Drillix+Combined+
> > Operational+%26+Analytical+SQL+at+Scale.pdf
> > >
> > > On Sat, Oct 22, 2016 at 1:02 PM, Eli Levine <[email protected]>
> wrote:
> > >
> > >> Greetings, Calcite devs. First of all, thank you for your work on
> > Calcite!
> > >>
> > >> I am working on a federated query engine that will use Spark (or
> > something
> > >> similar) as the main execution engine. Among other data sources the
> > query
> > >> engine will read from Apache Phoenix tables/views. The hope is to
> > utilize
> > >> Calcite as the query planner and optimizer component of this query
> > engine.
> > >>
> > >> At a high level, I am trying to build the following using Calcite:
> > >> 1. Generate a relational algebra expression tree using RelBuilder
> based
> > on
> > >> user input. I plan to implement custom schema and table classes based
> > on my
> > >> metadata.
> > >> 2. Provide Calcite with query optimization rules.
> > >> 3. Traverse the optimized expression tree to generate a set of Spark
> > >> instructions.
> > >> 4. Execute query instructions via Spark.
> > >>
> > >> A few questions regarding the above:
> > >> 1. Are there existing examples of code that does #3 above? I looked at
> > the
> > >> Spark submodule and it seems pretty bare-bones. What would be great to
> > see
> > >> is an example of a RelNode tree being traversed to create a plan for
> > >> asynchronous execution via something like Spark or Pig.
> > >> 2. An important query optimization that is planned initially is to be
> > able
> > >> to push down simple filters to Phoenix (the plan is to use
> Phoenix-Spark
> > >> <http://phoenix.apache.org/phoenix_spark.html> integration for
> reading
> > >> data). Any examples of such push-downs to specific data sources in a
> > >> federated query scenario would be much appreciated.
> > >>
> > >> Thank you! Looking forward to working with the Calcite community.
> > >>
> > >> -------------
> > >> Eli Levine
> > >> Software Engineering Architect -- Salesforce.com
> > >>
> >
> >
>

Re: Calcite with Phoenix and Spark

Reply via email to