Re: Speeding up Catalyst engine

Maciej Bryński Tue, 25 Jul 2017 08:59:37 -0700

Hi,

I did backport this to 2.2.
First results of tests (join of about 60 tables).
Vanilla Spark: 50 sec
With 20392 - 38 sec
With 20392 and spark.sql.selfJoinAutoResolveAmbiguity=false - 29 sec
Vanilla Spark with spark.sql.selfJoinAutoResolveAmbiguity=false - 34 sec


I didn't measure any difference
changing spark.sql.constraintPropagation.enabled and any other spark.sql
option.

So I will leave your patch on top of 2.2
Thank you.

M.

2017-07-25 1:39 GMT+02:00 Liang-Chi Hsieh <vii...@gmail.com>:

>
> Hi Maciej,
>
> For backportting https://issues.apache.org/jira/browse/SPARK-20392, you
> can
> see the suggestion from committers on the PR. I think we don't expect it
> will be merged into 2.2.
>
>
>
> Maciej Bryński wrote
> > Hi Everyone,
> > I'm trying to speed up my Spark streaming application and I have
> following
> > problem.
> > I'm using a lot of joins in my app and full catalyst analysis is
> triggered
> > during every join.
> >
> > I found 2 options to speed up.
> >
> > 1) spark.sql.selfJoinAutoResolveAmbiguity  option
> > But looking at code:
> > https://github.com/apache/spark/blob/8cd9cdf17a7a4ad6f2eecd7c4b388c
> a363c20982/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L918
> >
> > Shouldn't lines 925-927 be before 920-922 ?
> >
> > 2) https://issues.apache.org/jira/browse/SPARK-20392
> >
> > Is it safe to use it on top of 2.2.0 ?
> >
> > Regards,
> > --
> > Maciek Bryński
>
>
>
>
>
> -----
> Liang-Chi Hsieh | @viirya
> Spark Technology Center
> http://www.spark.tc/
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Speeding-up-
> Catalyst-engine-tp22013p22014.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


-- 
Maciek Bryński

Re: Speeding up Catalyst engine

Reply via email to