Eli, Can you define what you mean by "fault-tolerant"? Phoenix+HBase are fault tolerant through the retries that HBase does. Thanks, James
On Tue, Nov 1, 2016 at 11:35 AM, Eli Levine <[email protected]> wrote: > Thank you for the pointers, Julian and James! I have a requirement that the > main execution engine is a fault-tolerant one and at this point the main > contenders are Pig and Spark. Drillx is great as a source of example usages > of Calcite, so it will definitely be useful. > > And yes, the hope is to contribute any Spark and/or Pig adapter code that > gets developed to Calcite. > > Eli > > > On Sat, Oct 22, 2016 at 9:56 PM, Julian Hyde <[email protected]> wrote: > > > Well, to correct James slightly, there is SOME support for Spark in > > Calcite, but it’s fair to say that it hasn’t had much love. If you would > > like to get something working then Drillix (Drill + Phoenix + Calcite) is > > the way to go. > > > > That said, Spark is an excellent and hugely popular execution > environment, > > so I would very much like to improve the Spark adapter. A few people on > > this list have talked about that over the past couple of months. If you > > would like to join that effort, it would be most welcome, but there’s > more > > work to be done before you start getting results. > > > > Julian > > > > > > > On Oct 22, 2016, at 4:41 PM, James Taylor <[email protected]> > > wrote: > > > > > > Hi Eli, > > > With the calcite branch of Phoenix you're part way there. I think a > good > > > way to approach this would be to create a new set of operators that > > > correspond to Spark operations and the corresponding rules that know > when > > > to use them. These could then be costed with the other Phoenix > operators > > at > > > planning time. Spark would work especially well to store intermediate > > > results in more complex queries. > > > > > > Since Spark doesn't integrate natively with Calcite, I think using > Spark > > > directly may not get you where you need to go. In the same way, the > > > Phoenix-Spark integration is higher level, built on top of Phoenix and > > has > > > no direct integration with Calcite. > > > > > > Another alternative to consider would be using Drillix (Drill + > Phoenix) > > > which uses Calcite underneath[1]. > > > > > > Thanks, > > > James > > > > > > [1] > > > https://apurtell.s3.amazonaws.com/phoenix/Drillix+Combined+ > > Operational+%26+Analytical+SQL+at+Scale.pdf > > > > > > On Sat, Oct 22, 2016 at 1:02 PM, Eli Levine <[email protected]> > wrote: > > > > > >> Greetings, Calcite devs. First of all, thank you for your work on > > Calcite! > > >> > > >> I am working on a federated query engine that will use Spark (or > > something > > >> similar) as the main execution engine. Among other data sources the > > query > > >> engine will read from Apache Phoenix tables/views. The hope is to > > utilize > > >> Calcite as the query planner and optimizer component of this query > > engine. > > >> > > >> At a high level, I am trying to build the following using Calcite: > > >> 1. Generate a relational algebra expression tree using RelBuilder > based > > on > > >> user input. I plan to implement custom schema and table classes based > > on my > > >> metadata. > > >> 2. Provide Calcite with query optimization rules. > > >> 3. Traverse the optimized expression tree to generate a set of Spark > > >> instructions. > > >> 4. Execute query instructions via Spark. > > >> > > >> A few questions regarding the above: > > >> 1. Are there existing examples of code that does #3 above? I looked at > > the > > >> Spark submodule and it seems pretty bare-bones. What would be great to > > see > > >> is an example of a RelNode tree being traversed to create a plan for > > >> asynchronous execution via something like Spark or Pig. > > >> 2. An important query optimization that is planned initially is to be > > able > > >> to push down simple filters to Phoenix (the plan is to use > Phoenix-Spark > > >> <http://phoenix.apache.org/phoenix_spark.html> integration for > reading > > >> data). Any examples of such push-downs to specific data sources in a > > >> federated query scenario would be much appreciated. > > >> > > >> Thank you! Looking forward to working with the Calcite community. > > >> > > >> ------------- > > >> Eli Levine > > >> Software Engineering Architect -- Salesforce.com > > >> > > > > >
