Hi JiaTao, That sounds interesting. A few questions:
1- When you go from RelNode to "Spark DataFrame plan", did you mean you go to: * Spark SQL * Spark DataFrame Scala code * In-memory Spark Catalyst Plan * Human readable string representation of Spark plan (e.g., similar to DataFrame.explain) * Some serialization of the in-memory Spark plan (similar to human readable, but more ser/de friendly without necessarily being human readable)? 2- In your Presto and Spark conversions mentioned below, you stated you start from a RelNode. Could you clarify where the RelNode is originally coming from? What is the use case in both? 3- That would be awesome if you could contribute to coral-spark-plan [1]. Currently its objective is to convert human readable Spark plan (output of DataFrame.explain) to RelNode. Right now it can do basic conversions (see test cases [2]).This module can help with: ** Analyzing Spark jobs (we have used it figure out which Spark jobs in our history server push down complex predicates down, as complex predicates are not supported on DataSource V2 [3]) ** Converting arbitrary Spark logic to other platforms (e.g., Spark Catalyst plan to Presto), since even Scala code ends up being represented in the plan string in a structured way. ** Converting Spark scala code back to SQL [1] https://github.com/linkedin/coral/tree/master/coral-spark-plan [2] https://github.com/linkedin/coral/blob/master/coral-spark-plan/src/test/java/com/linkedin/coral/sparkplan/SparkPlanToIRRelConverterTest.java [3] https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-SparkStrategy-DataSourceStrategy.html Thanks, Walaa. On Sat, Dec 12, 2020 at 5:24 AM JiaTao Tao <[email protected]> wrote: > Hi Walaa > Very happy to see this, our team basically do the same thing, a unified SQL > layer: > 1. Spark: RelNode -> Spark DataFrame plan > 2. Presto: RelNode -> In string SQL > 3. Clickhouse: RelNode -> Serialized RelNode > 4. Flink -> TBD(with datastream API or table API) > > > I do point 1 both in my previous company and current company, maybe I can > participate in this part: analyze and translate Spark Catalyst plans. > > > Regards! > > Aron Tao > > > Walaa Eldin Moustafa <[email protected]> 于2020年12月12日周六 上午5:34写道: > > > Hi Calcite community, > > > > I wanted to share a recently published LinkedIn's blog series article [1] > > on how Calcite helps us build a smarter data lake using Coral [2]. Hope > you > > find it interesting. Also, if you want to discuss with our team and the > > data lake + Calcite community, please feel free to join our Coral Slack > > workspace [3]. > > > > [1] https://engineering.linkedin.com/blog/2020/coral > > [2] https://github.com/linkedin/coral > > [3] > > > > > https://join.slack.com/t/coral-sql/shared_invite/zt-j9jw5idg-mkt3fjA~wgoUEMXXZqMr6g > > > > Thanks, > > Walaa. > > >
