Calcite with Phoenix and Spark

Eli Levine Sat, 22 Oct 2016 13:04:09 -0700

Greetings, Calcite devs. First of all, thank you for your work on Calcite!

I am working on a federated query engine that will use Spark (or something
similar) as the main execution engine. Among other data sources the query
engine will read from Apache Phoenix tables/views. The hope is to utilize
Calcite as the query planner and optimizer component of this query engine.


At a high level, I am trying to build the following using Calcite:
1. Generate a relational algebra expression tree using RelBuilder based on
user input. I plan to implement custom schema and table classes based on my
metadata.
2. Provide Calcite with query optimization rules.
3. Traverse the optimized expression tree to generate a set of Spark
instructions.
4. Execute query instructions via Spark.

A few questions regarding the above:
1. Are there existing examples of code that does #3 above? I looked at the
Spark submodule and it seems pretty bare-bones. What would be great to see
is an example of a RelNode tree being traversed to create a plan for
asynchronous execution via something like Spark or Pig.
2. An important query optimization that is planned initially is to be able
to push down simple filters to Phoenix (the plan is to use Phoenix-Spark
<http://phoenix.apache.org/phoenix_spark.html> integration for reading
data). Any examples of such push-downs to specific data sources in a
federated query scenario would be much appreciated.

Thank you! Looking forward to working with the Calcite community.

-------------
Eli Levine
Software Engineering Architect -- Salesforce.com

Calcite with Phoenix and Spark

Reply via email to