Victor Giannakouris - Salalidis wrote: > At first I would like to introduce my self to the developers list. My name > is Victor and I am an undergraduate computer science student. Currently I > am doing some research on query optimization.
Nice to meet you! > I am quite new to Calcite project and I am facing some issues. My major > problem is how could I use calcite spark adapter programmaticaly? I cannot > find such a tutorial or documentation. The Spark adapter is not quite like the other adapters. Most adapters (e.g. MongoDB or JDBC) have their own metadata, so they can tell Calcite which tables they contain, by implementing the SchemaFactory SPI. If you use one of those tables, Calcite will push down as much processing as it can to execute in the source system. Spark is an engine. It doesn't have its own data. Therefore Calcite uses it to implement the operations after the data has left the data source. Drill and Flink are also engines, as are Calcite's native "enumerable" physical relational operators, so I would like to generalize the "engine" interface so that you can plug in the engine of your choice, regardless of where you are reading the data from. The other thing to be said about Spark is that Calcite's adapter is based on an old version of Spark (0.9.0-incubating). > Second, how can I run queries on SparkSQL via sqline? SparkSQL doesn't need Calcite - it has its own parser and planner. (Not as good as Calcite, obviously!) If I understand SparkSQL's architecture correctly, the only piece that is missing is a JDBC driver. Someone from the Spark community could implement a JDBC driver, and Avatica would be a good way to do that. They can use Avatica to implement the JDBC APIs and as the RPC layer, and they don't need to pull in anything else from Calcite. wangzhenhua wrote: > I'm also interested in how calcite can be integrated into sparksql so that > spark can use calcite to optimize queries? > Do we have such APIs? If not, is there any roadmap to do this? I don't think you could plug Calcite into SparkSQL very easily. But you could use Calcite as a SQL front end and an optimizer. That is what Calcite's Spark adapter/engine does. We should enhance it to read whatever metadata source SparkSQL uses (so that we are seeing the same set of tables as SparkSQL) but no one is working on that right now. Julian
