You’re asking for a couple of things. First, a Spark adapter (able to generate 
code that uses Spark as an engine, and in particular use its implementations of 
the relational operators: join, aggregate etc.). Second, a Hive schema factory 
(able to read metadata from HCatalog).

The first exists but needs an update[1], and the second does not exist[2]. You 
need both for your purposes, but they could be written and used independently 
of each other. Contributions welcome, as usual.

Julian

[1] https://issues.apache.org/jira/browse/CALCITE-1274 
<https://issues.apache.org/jira/browse/CALCITE-1274>

[2] https://issues.apache.org/jira/browse/CALCITE-1275 
<https://issues.apache.org/jira/browse/CALCITE-1275>

> On Jun 5, 2016, at 9:03 AM, Γιώργος Θεοδωράκης <[email protected]> 
> wrote:
> 
> Hello,
> 
> My name is George and I am an undergraduate computer science student. I am
> doing some research for my diploma thesis about query optimization on
> distributed systems. After reading some basics about Calcite project, I
> thought I could use it as an SQL optimizer on top of Spark.
> I have a Hadoop cluster running on multiple machines, and I run SQl queries
> with SparkSQL on data saved in a Data Warehouse (HIVE). My goal is to
> optimize certain queries by pushing rules and functions down to the nodes
> with a framework like Calcite. However, I haven't found any related
> documentation and I am not sure if it is even possible to access the
> metadata of hive through Calcite and run the optimizations on Spark. Can
> you help me?
> 
> Thank you in advance.

Reply via email to