Hi Walaa 1. Spark DataFrame Scala code 2. In our scenario, our system is a SQL gateway, we are responsible to parse/validate/convert/optimize the SQL with calcite, you can get rel node after optimizing.
Regards! Aron Tao Walaa Eldin Moustafa <[email protected]> 于2020年12月13日周日 上午1:53写道: > Hi JiaTao, > > That sounds interesting. A few questions: > > 1- When you go from RelNode to "Spark DataFrame plan", did you mean you go > to: > * Spark SQL > * Spark DataFrame Scala code > * In-memory Spark Catalyst Plan > * Human readable string representation of Spark plan (e.g., similar to > DataFrame.explain) > * Some serialization of the in-memory Spark plan (similar to human > readable, but more ser/de friendly without necessarily being human > readable)? > > 2- In your Presto and Spark conversions mentioned below, you stated you > start from a RelNode. Could you clarify where the RelNode is originally > coming from? What is the use case in both? > > 3- That would be awesome if you could contribute to coral-spark-plan [1]. > Currently its objective is to convert human readable Spark plan (output of > DataFrame.explain) to RelNode. Right now it can do basic conversions (see > test cases [2]).This module can help with: > ** Analyzing Spark jobs (we have used it figure out which Spark jobs in our > history server push down complex predicates down, as complex predicates are > not supported on DataSource V2 [3]) > ** Converting arbitrary Spark logic to other platforms (e.g., Spark > Catalyst plan to Presto), since even Scala code ends up being represented > in the plan string in a structured way. > ** Converting Spark scala code back to SQL > > [1] https://github.com/linkedin/coral/tree/master/coral-spark-plan > [2] > > https://github.com/linkedin/coral/blob/master/coral-spark-plan/src/test/java/com/linkedin/coral/sparkplan/SparkPlanToIRRelConverterTest.java > [3] > > https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-SparkStrategy-DataSourceStrategy.html > > Thanks, > Walaa. > > > On Sat, Dec 12, 2020 at 5:24 AM JiaTao Tao <[email protected]> wrote: > > > Hi Walaa > > Very happy to see this, our team basically do the same thing, a unified > SQL > > layer: > > 1. Spark: RelNode -> Spark DataFrame plan > > 2. Presto: RelNode -> In string SQL > > 3. Clickhouse: RelNode -> Serialized RelNode > > 4. Flink -> TBD(with datastream API or table API) > > > > > > I do point 1 both in my previous company and current company, maybe I can > > participate in this part: analyze and translate Spark Catalyst plans. > > > > > > Regards! > > > > Aron Tao > > > > > > Walaa Eldin Moustafa <[email protected]> 于2020年12月12日周六 上午5:34写道: > > > > > Hi Calcite community, > > > > > > I wanted to share a recently published LinkedIn's blog series article > [1] > > > on how Calcite helps us build a smarter data lake using Coral [2]. Hope > > you > > > find it interesting. Also, if you want to discuss with our team and the > > > data lake + Calcite community, please feel free to join our Coral Slack > > > workspace [3]. > > > > > > [1] https://engineering.linkedin.com/blog/2020/coral > > > [2] https://github.com/linkedin/coral > > > [3] > > > > > > > > > https://join.slack.com/t/coral-sql/shared_invite/zt-j9jw5idg-mkt3fjA~wgoUEMXXZqMr6g > > > > > > Thanks, > > > Walaa. > > > > > >
