Re: Using Calcite at LinkedIn

JiaTao Tao Sun, 13 Dec 2020 18:08:03 -0800

Hi Walaa
1.  Spark DataFrame Scala code
2. In our scenario, our system is a SQL gateway, we are responsible to
parse/validate/convert/optimize the SQL with calcite, you can get rel node
after optimizing.


Regards!

Aron Tao


Walaa Eldin Moustafa <[email protected]> 于2020年12月13日周日 上午1:53写道：

> Hi JiaTao,
>
> That sounds interesting. A few questions:
>
> 1- When you go from RelNode to "Spark DataFrame plan", did you mean you go
> to:
>   * Spark SQL
>   * Spark DataFrame Scala code
>   * In-memory Spark Catalyst Plan
>   * Human readable string representation of Spark plan (e.g., similar to
> DataFrame.explain)
>   * Some serialization of the in-memory Spark plan (similar to human
> readable, but more ser/de friendly without necessarily being human
> readable)?
>
> 2- In your Presto and Spark conversions mentioned below, you stated you
> start from a RelNode. Could you clarify where the RelNode is originally
> coming from? What is the use case in both?
>
> 3- That would be awesome if you could contribute to coral-spark-plan [1].
> Currently its objective is to convert human readable Spark plan (output of
> DataFrame.explain) to RelNode. Right now it can do basic conversions (see
> test cases [2]).This module can help with:
> ** Analyzing Spark jobs (we have used it figure out which Spark jobs in our
> history server push down complex predicates down, as complex predicates are
> not supported on DataSource V2 [3])
> ** Converting arbitrary Spark logic to other platforms (e.g., Spark
> Catalyst plan to Presto), since even Scala code ends up being represented
> in the plan string in a structured way.
>  ** Converting Spark scala code back to SQL
>
> [1] https://github.com/linkedin/coral/tree/master/coral-spark-plan
> [2]
>
> https://github.com/linkedin/coral/blob/master/coral-spark-plan/src/test/java/com/linkedin/coral/sparkplan/SparkPlanToIRRelConverterTest.java
> [3]
>
> https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-SparkStrategy-DataSourceStrategy.html
>
> Thanks,
> Walaa.
>
>
> On Sat, Dec 12, 2020 at 5:24 AM JiaTao Tao <[email protected]> wrote:
>
> > Hi Walaa
> > Very happy to see this, our team basically do the same thing, a unified
> SQL
> > layer:
> > 1. Spark: RelNode -> Spark DataFrame plan
> > 2. Presto: RelNode -> In string SQL
> > 3. Clickhouse: RelNode -> Serialized RelNode
> > 4. Flink -> TBD(with datastream API or table API)
> >
> >
> > I do point 1 both in my previous company and current company, maybe I can
> > participate in this part:  analyze and translate Spark Catalyst plans.
> >
> >
> > Regards!
> >
> > Aron Tao
> >
> >
> > Walaa Eldin Moustafa <[email protected]> 于2020年12月12日周六 上午5:34写道：
> >
> > > Hi Calcite community,
> > >
> > > I wanted to share a recently published LinkedIn's blog series article
> [1]
> > > on how Calcite helps us build a smarter data lake using Coral [2]. Hope
> > you
> > > find it interesting. Also, if you want to discuss with our team and the
> > > data lake + Calcite community, please feel free to join our Coral Slack
> > > workspace [3].
> > >
> > > [1] https://engineering.linkedin.com/blog/2020/coral
> > > [2] https://github.com/linkedin/coral
> > > [3]
> > >
> > >
> >
> https://join.slack.com/t/coral-sql/shared_invite/zt-j9jw5idg-mkt3fjA~wgoUEMXXZqMr6g
> > >
> > > Thanks,
> > > Walaa.
> > >
> >
>

Re: Using Calcite at LinkedIn

Reply via email to