Hi calcite developers,
I am currently working on a project about query optimization and trying to set up calcite to use spark as execution engine. However, I have many questions about current calcite spark adapter. I understand that spark adapter need to get metadata about tables from other source systems. Therefore, what I am trying to do is to use the schema factory provided by JDBC adaptor to get metadata from a MySQL server and then use spark as an execution engine. Currently, I am able to get a plan from the volcano planner. The root of the plan is a SparkToEnumerableConverter and its input is a JdbcToSparkConverter whose input is the query plan. However, the program crashes and exit in the later phase when the plan is being implemented. I am not sure if it is the correct way to use Spark adapter and I am still unclear about how Spark adapter works in detail. I think it should compose a Java program based on the query plan using Spark's RDD API and submit the program to Spark cluster. However, I just find only a very limited number of RDD operations are used in Spark adapter's code. Just wondering if Spark adapter is capable of converting a non-trivial query (with join and aggregation) to a Spark program. Cheers, Hao Tan
