Re: Calcite with Phoenix and Spark

Daniel Dai Wed, 02 Nov 2016 15:25:44 -0700

If you want to go down the Pig adapter path, I will help on the Pig side.

Thanks,
Daniel


On Wed, Nov 2, 2016 at 3:21 PM, Eli Levine <[email protected]> wrote:

> Will follow your suggested model when I start development. Thanks for
> offering to potentially include that work in Calcite, Julian.
>
> Eli
>
>
> On Tue, Nov 1, 2016 at 11:50 AM, Julian Hyde <[email protected]> wrote:
>
>> If it helps make your “hope” a bit more likely to happen, you should
>> consider doing your Spark or Pig adapters in the Calcite code base, that
>> is, as a fork of the Calcite repo on GitHub from which you periodically
>> submit pull requests.  I would welcome that development model. For big,
>> important features like this, I am comfortable including alpha or beta
>> quality code in the Calcite release.
>>
>> If you do the work as part of the Calcite project, almost certainly other
>> developers will want to help out. You’ll do less work  yourself, and end up
>> with a more robust result.
>>
>> I am Cc:ing Daniel Dai. He and I have talked about a Pig adapter for
>> Calcite in the past. If you decide to go that route I’m Daniel may be able
>> to help out.
>>
>> Julian
>>
>> > On Nov 1, 2016, at 11:35 AM, Eli Levine <[email protected]> wrote:
>> >
>> > Thank you for the pointers, Julian and James! I have a requirement that
>> the
>> > main execution engine is a fault-tolerant one and at this point the main
>> > contenders are Pig and Spark. Drillx is great as a source of example
>> usages
>> > of Calcite, so it will definitely be useful.
>> >
>> > And yes, the hope is to contribute any Spark and/or Pig adapter code
>> that
>> > gets developed to Calcite.
>> >
>> > Eli
>> >
>> >
>> > On Sat, Oct 22, 2016 at 9:56 PM, Julian Hyde <[email protected]> wrote:
>> >
>> >> Well, to correct James slightly, there is SOME support for Spark in
>> >> Calcite, but it’s fair to say that it hasn’t had much love. If you
>> would
>> >> like to get something working then Drillix (Drill + Phoenix + Calcite)
>> is
>> >> the way to go.
>> >>
>> >> That said, Spark is an excellent and hugely popular execution
>> environment,
>> >> so I would very much like to improve the Spark adapter. A few people on
>> >> this list have talked about that over the past couple of months. If you
>> >> would like to join that effort, it would be most welcome, but there’s
>> more
>> >> work to be done before you start getting results.
>> >>
>> >> Julian
>> >>
>> >>
>> >>> On Oct 22, 2016, at 4:41 PM, James Taylor <[email protected]>
>> >> wrote:
>> >>>
>> >>> Hi Eli,
>> >>> With the calcite branch of Phoenix you're part way there. I think a
>> good
>> >>> way to approach this would be to create a new set of operators that
>> >>> correspond to Spark operations and the corresponding rules that know
>> when
>> >>> to use them. These could then be costed with the other Phoenix
>> operators
>> >> at
>> >>> planning time. Spark would work especially well to store intermediate
>> >>> results in more complex queries.
>> >>>
>> >>> Since Spark doesn't integrate natively with Calcite, I think using
>> Spark
>> >>> directly may not get you where you need to go. In the same way, the
>> >>> Phoenix-Spark integration is higher level, built on top of Phoenix and
>> >> has
>> >>> no direct integration with Calcite.
>> >>>
>> >>> Another alternative to consider would be using Drillix (Drill +
>> Phoenix)
>> >>> which uses Calcite underneath[1].
>> >>>
>> >>> Thanks,
>> >>> James
>> >>>
>> >>> [1]
>> >>> https://apurtell.s3.amazonaws.com/phoenix/Drillix+Combined+
>> >> Operational+%26+Analytical+SQL+at+Scale.pdf
>> >>>
>> >>> On Sat, Oct 22, 2016 at 1:02 PM, Eli Levine <[email protected]>
>> wrote:
>> >>>
>> >>>> Greetings, Calcite devs. First of all, thank you for your work on
>> >> Calcite!
>> >>>>
>> >>>> I am working on a federated query engine that will use Spark (or
>> >> something
>> >>>> similar) as the main execution engine. Among other data sources the
>> >> query
>> >>>> engine will read from Apache Phoenix tables/views. The hope is to
>> >> utilize
>> >>>> Calcite as the query planner and optimizer component of this query
>> >> engine.
>> >>>>
>> >>>> At a high level, I am trying to build the following using Calcite:
>> >>>> 1. Generate a relational algebra expression tree using RelBuilder
>> based
>> >> on
>> >>>> user input. I plan to implement custom schema and table classes based
>> >> on my
>> >>>> metadata.
>> >>>> 2. Provide Calcite with query optimization rules.
>> >>>> 3. Traverse the optimized expression tree to generate a set of Spark
>> >>>> instructions.
>> >>>> 4. Execute query instructions via Spark.
>> >>>>
>> >>>> A few questions regarding the above:
>> >>>> 1. Are there existing examples of code that does #3 above? I looked
>> at
>> >> the
>> >>>> Spark submodule and it seems pretty bare-bones. What would be great
>> to
>> >> see
>> >>>> is an example of a RelNode tree being traversed to create a plan for
>> >>>> asynchronous execution via something like Spark or Pig.
>> >>>> 2. An important query optimization that is planned initially is to be
>> >> able
>> >>>> to push down simple filters to Phoenix (the plan is to use
>> Phoenix-Spark
>> >>>> <http://phoenix.apache.org/phoenix_spark.html> integration for
>> reading
>> >>>> data). Any examples of such push-downs to specific data sources in a
>> >>>> federated query scenario would be much appreciated.
>> >>>>
>> >>>> Thank you! Looking forward to working with the Calcite community.
>> >>>>
>> >>>> -------------
>> >>>> Eli Levine
>> >>>> Software Engineering Architect -- Salesforce.com
>> >>>>
>> >>
>> >>
>>
>>
>

Re: Calcite with Phoenix and Spark

Reply via email to