It's a fairly loose term. For us it generally means being able to recover from node failures without having to rerun the process from the beginning. M/R, Spark fall broadly into that category.
Thanks, Eli On Tue, Nov 1, 2016 at 11:46 AM, James Taylor <[email protected]> wrote: > Eli, > Can you define what you mean by "fault-tolerant"? Phoenix+HBase are fault > tolerant through the retries that HBase does. > Thanks, > James > > On Tue, Nov 1, 2016 at 11:35 AM, Eli Levine <[email protected]> wrote: > > > Thank you for the pointers, Julian and James! I have a requirement that > the > > main execution engine is a fault-tolerant one and at this point the main > > contenders are Pig and Spark. Drillx is great as a source of example > usages > > of Calcite, so it will definitely be useful. > > > > And yes, the hope is to contribute any Spark and/or Pig adapter code that > > gets developed to Calcite. > > > > Eli > > > > > > On Sat, Oct 22, 2016 at 9:56 PM, Julian Hyde <[email protected]> wrote: > > > > > Well, to correct James slightly, there is SOME support for Spark in > > > Calcite, but it’s fair to say that it hasn’t had much love. If you > would > > > like to get something working then Drillix (Drill + Phoenix + Calcite) > is > > > the way to go. > > > > > > That said, Spark is an excellent and hugely popular execution > > environment, > > > so I would very much like to improve the Spark adapter. A few people on > > > this list have talked about that over the past couple of months. If you > > > would like to join that effort, it would be most welcome, but there’s > > more > > > work to be done before you start getting results. > > > > > > Julian > > > > > > > > > > On Oct 22, 2016, at 4:41 PM, James Taylor <[email protected]> > > > wrote: > > > > > > > > Hi Eli, > > > > With the calcite branch of Phoenix you're part way there. I think a > > good > > > > way to approach this would be to create a new set of operators that > > > > correspond to Spark operations and the corresponding rules that know > > when > > > > to use them. These could then be costed with the other Phoenix > > operators > > > at > > > > planning time. Spark would work especially well to store intermediate > > > > results in more complex queries. > > > > > > > > Since Spark doesn't integrate natively with Calcite, I think using > > Spark > > > > directly may not get you where you need to go. In the same way, the > > > > Phoenix-Spark integration is higher level, built on top of Phoenix > and > > > has > > > > no direct integration with Calcite. > > > > > > > > Another alternative to consider would be using Drillix (Drill + > > Phoenix) > > > > which uses Calcite underneath[1]. > > > > > > > > Thanks, > > > > James > > > > > > > > [1] > > > > https://apurtell.s3.amazonaws.com/phoenix/Drillix+Combined+ > > > Operational+%26+Analytical+SQL+at+Scale.pdf > > > > > > > > On Sat, Oct 22, 2016 at 1:02 PM, Eli Levine <[email protected]> > > wrote: > > > > > > > >> Greetings, Calcite devs. First of all, thank you for your work on > > > Calcite! > > > >> > > > >> I am working on a federated query engine that will use Spark (or > > > something > > > >> similar) as the main execution engine. Among other data sources the > > > query > > > >> engine will read from Apache Phoenix tables/views. The hope is to > > > utilize > > > >> Calcite as the query planner and optimizer component of this query > > > engine. > > > >> > > > >> At a high level, I am trying to build the following using Calcite: > > > >> 1. Generate a relational algebra expression tree using RelBuilder > > based > > > on > > > >> user input. I plan to implement custom schema and table classes > based > > > on my > > > >> metadata. > > > >> 2. Provide Calcite with query optimization rules. > > > >> 3. Traverse the optimized expression tree to generate a set of Spark > > > >> instructions. > > > >> 4. Execute query instructions via Spark. > > > >> > > > >> A few questions regarding the above: > > > >> 1. Are there existing examples of code that does #3 above? I looked > at > > > the > > > >> Spark submodule and it seems pretty bare-bones. What would be great > to > > > see > > > >> is an example of a RelNode tree being traversed to create a plan for > > > >> asynchronous execution via something like Spark or Pig. > > > >> 2. An important query optimization that is planned initially is to > be > > > able > > > >> to push down simple filters to Phoenix (the plan is to use > > Phoenix-Spark > > > >> <http://phoenix.apache.org/phoenix_spark.html> integration for > > reading > > > >> data). Any examples of such push-downs to specific data sources in a > > > >> federated query scenario would be much appreciated. > > > >> > > > >> Thank you! Looking forward to working with the Calcite community. > > > >> > > > >> ------------- > > > >> Eli Levine > > > >> Software Engineering Architect -- Salesforce.com > > > >> > > > > > > > > >
