Will follow your suggested model when I start development. Thanks for offering to potentially include that work in Calcite, Julian.
Eli On Tue, Nov 1, 2016 at 11:50 AM, Julian Hyde <[email protected]> wrote: > If it helps make your “hope” a bit more likely to happen, you should > consider doing your Spark or Pig adapters in the Calcite code base, that > is, as a fork of the Calcite repo on GitHub from which you periodically > submit pull requests. I would welcome that development model. For big, > important features like this, I am comfortable including alpha or beta > quality code in the Calcite release. > > If you do the work as part of the Calcite project, almost certainly other > developers will want to help out. You’ll do less work yourself, and end up > with a more robust result. > > I am Cc:ing Daniel Dai. He and I have talked about a Pig adapter for > Calcite in the past. If you decide to go that route I’m Daniel may be able > to help out. > > Julian > > > On Nov 1, 2016, at 11:35 AM, Eli Levine <[email protected]> wrote: > > > > Thank you for the pointers, Julian and James! I have a requirement that > the > > main execution engine is a fault-tolerant one and at this point the main > > contenders are Pig and Spark. Drillx is great as a source of example > usages > > of Calcite, so it will definitely be useful. > > > > And yes, the hope is to contribute any Spark and/or Pig adapter code that > > gets developed to Calcite. > > > > Eli > > > > > > On Sat, Oct 22, 2016 at 9:56 PM, Julian Hyde <[email protected]> wrote: > > > >> Well, to correct James slightly, there is SOME support for Spark in > >> Calcite, but it’s fair to say that it hasn’t had much love. If you would > >> like to get something working then Drillix (Drill + Phoenix + Calcite) > is > >> the way to go. > >> > >> That said, Spark is an excellent and hugely popular execution > environment, > >> so I would very much like to improve the Spark adapter. A few people on > >> this list have talked about that over the past couple of months. If you > >> would like to join that effort, it would be most welcome, but there’s > more > >> work to be done before you start getting results. > >> > >> Julian > >> > >> > >>> On Oct 22, 2016, at 4:41 PM, James Taylor <[email protected]> > >> wrote: > >>> > >>> Hi Eli, > >>> With the calcite branch of Phoenix you're part way there. I think a > good > >>> way to approach this would be to create a new set of operators that > >>> correspond to Spark operations and the corresponding rules that know > when > >>> to use them. These could then be costed with the other Phoenix > operators > >> at > >>> planning time. Spark would work especially well to store intermediate > >>> results in more complex queries. > >>> > >>> Since Spark doesn't integrate natively with Calcite, I think using > Spark > >>> directly may not get you where you need to go. In the same way, the > >>> Phoenix-Spark integration is higher level, built on top of Phoenix and > >> has > >>> no direct integration with Calcite. > >>> > >>> Another alternative to consider would be using Drillix (Drill + > Phoenix) > >>> which uses Calcite underneath[1]. > >>> > >>> Thanks, > >>> James > >>> > >>> [1] > >>> https://apurtell.s3.amazonaws.com/phoenix/Drillix+Combined+ > >> Operational+%26+Analytical+SQL+at+Scale.pdf > >>> > >>> On Sat, Oct 22, 2016 at 1:02 PM, Eli Levine <[email protected]> > wrote: > >>> > >>>> Greetings, Calcite devs. First of all, thank you for your work on > >> Calcite! > >>>> > >>>> I am working on a federated query engine that will use Spark (or > >> something > >>>> similar) as the main execution engine. Among other data sources the > >> query > >>>> engine will read from Apache Phoenix tables/views. The hope is to > >> utilize > >>>> Calcite as the query planner and optimizer component of this query > >> engine. > >>>> > >>>> At a high level, I am trying to build the following using Calcite: > >>>> 1. Generate a relational algebra expression tree using RelBuilder > based > >> on > >>>> user input. I plan to implement custom schema and table classes based > >> on my > >>>> metadata. > >>>> 2. Provide Calcite with query optimization rules. > >>>> 3. Traverse the optimized expression tree to generate a set of Spark > >>>> instructions. > >>>> 4. Execute query instructions via Spark. > >>>> > >>>> A few questions regarding the above: > >>>> 1. Are there existing examples of code that does #3 above? I looked at > >> the > >>>> Spark submodule and it seems pretty bare-bones. What would be great to > >> see > >>>> is an example of a RelNode tree being traversed to create a plan for > >>>> asynchronous execution via something like Spark or Pig. > >>>> 2. An important query optimization that is planned initially is to be > >> able > >>>> to push down simple filters to Phoenix (the plan is to use > Phoenix-Spark > >>>> <http://phoenix.apache.org/phoenix_spark.html> integration for > reading > >>>> data). Any examples of such push-downs to specific data sources in a > >>>> federated query scenario would be much appreciated. > >>>> > >>>> Thank you! Looking forward to working with the Calcite community. > >>>> > >>>> ------------- > >>>> Eli Levine > >>>> Software Engineering Architect -- Salesforce.com > >>>> > >> > >> > >
