Thank you for the pointers, Julian and James! I have a requirement that the main execution engine is a fault-tolerant one and at this point the main contenders are Pig and Spark. Drillx is great as a source of example usages of Calcite, so it will definitely be useful.
And yes, the hope is to contribute any Spark and/or Pig adapter code that gets developed to Calcite. Eli On Sat, Oct 22, 2016 at 9:56 PM, Julian Hyde <[email protected]> wrote: > Well, to correct James slightly, there is SOME support for Spark in > Calcite, but it’s fair to say that it hasn’t had much love. If you would > like to get something working then Drillix (Drill + Phoenix + Calcite) is > the way to go. > > That said, Spark is an excellent and hugely popular execution environment, > so I would very much like to improve the Spark adapter. A few people on > this list have talked about that over the past couple of months. If you > would like to join that effort, it would be most welcome, but there’s more > work to be done before you start getting results. > > Julian > > > > On Oct 22, 2016, at 4:41 PM, James Taylor <[email protected]> > wrote: > > > > Hi Eli, > > With the calcite branch of Phoenix you're part way there. I think a good > > way to approach this would be to create a new set of operators that > > correspond to Spark operations and the corresponding rules that know when > > to use them. These could then be costed with the other Phoenix operators > at > > planning time. Spark would work especially well to store intermediate > > results in more complex queries. > > > > Since Spark doesn't integrate natively with Calcite, I think using Spark > > directly may not get you where you need to go. In the same way, the > > Phoenix-Spark integration is higher level, built on top of Phoenix and > has > > no direct integration with Calcite. > > > > Another alternative to consider would be using Drillix (Drill + Phoenix) > > which uses Calcite underneath[1]. > > > > Thanks, > > James > > > > [1] > > https://apurtell.s3.amazonaws.com/phoenix/Drillix+Combined+ > Operational+%26+Analytical+SQL+at+Scale.pdf > > > > On Sat, Oct 22, 2016 at 1:02 PM, Eli Levine <[email protected]> wrote: > > > >> Greetings, Calcite devs. First of all, thank you for your work on > Calcite! > >> > >> I am working on a federated query engine that will use Spark (or > something > >> similar) as the main execution engine. Among other data sources the > query > >> engine will read from Apache Phoenix tables/views. The hope is to > utilize > >> Calcite as the query planner and optimizer component of this query > engine. > >> > >> At a high level, I am trying to build the following using Calcite: > >> 1. Generate a relational algebra expression tree using RelBuilder based > on > >> user input. I plan to implement custom schema and table classes based > on my > >> metadata. > >> 2. Provide Calcite with query optimization rules. > >> 3. Traverse the optimized expression tree to generate a set of Spark > >> instructions. > >> 4. Execute query instructions via Spark. > >> > >> A few questions regarding the above: > >> 1. Are there existing examples of code that does #3 above? I looked at > the > >> Spark submodule and it seems pretty bare-bones. What would be great to > see > >> is an example of a RelNode tree being traversed to create a plan for > >> asynchronous execution via something like Spark or Pig. > >> 2. An important query optimization that is planned initially is to be > able > >> to push down simple filters to Phoenix (the plan is to use Phoenix-Spark > >> <http://phoenix.apache.org/phoenix_spark.html> integration for reading > >> data). Any examples of such push-downs to specific data sources in a > >> federated query scenario would be much appreciated. > >> > >> Thank you! Looking forward to working with the Calcite community. > >> > >> ------------- > >> Eli Levine > >> Software Engineering Architect -- Salesforce.com > >> > >
