You’re asking for a couple of things. First, a Spark adapter (able to generate code that uses Spark as an engine, and in particular use its implementations of the relational operators: join, aggregate etc.). Second, a Hive schema factory (able to read metadata from HCatalog).
The first exists but needs an update[1], and the second does not exist[2]. You need both for your purposes, but they could be written and used independently of each other. Contributions welcome, as usual. Julian [1] https://issues.apache.org/jira/browse/CALCITE-1274 <https://issues.apache.org/jira/browse/CALCITE-1274> [2] https://issues.apache.org/jira/browse/CALCITE-1275 <https://issues.apache.org/jira/browse/CALCITE-1275> > On Jun 5, 2016, at 9:03 AM, Γιώργος Θεοδωράκης <[email protected]> > wrote: > > Hello, > > My name is George and I am an undergraduate computer science student. I am > doing some research for my diploma thesis about query optimization on > distributed systems. After reading some basics about Calcite project, I > thought I could use it as an SQL optimizer on top of Spark. > I have a Hadoop cluster running on multiple machines, and I run SQl queries > with SparkSQL on data saved in a Data Warehouse (HIVE). My goal is to > optimize certain queries by pushing rules and functions down to the nodes > with a framework like Calcite. However, I haven't found any related > documentation and I am not sure if it is even possible to access the > metadata of hive through Calcite and run the optimizations on Spark. Can > you help me? > > Thank you in advance.
