Hi, I did backport this to 2.2. First results of tests (join of about 60 tables). Vanilla Spark: 50 sec With 20392 - 38 sec With 20392 and spark.sql.selfJoinAutoResolveAmbiguity=false - 29 sec Vanilla Spark with spark.sql.selfJoinAutoResolveAmbiguity=false - 34 sec
I didn't measure any difference changing spark.sql.constraintPropagation.enabled and any other spark.sql option. So I will leave your patch on top of 2.2 Thank you. M. 2017-07-25 1:39 GMT+02:00 Liang-Chi Hsieh <vii...@gmail.com>: > > Hi Maciej, > > For backportting https://issues.apache.org/jira/browse/SPARK-20392, you > can > see the suggestion from committers on the PR. I think we don't expect it > will be merged into 2.2. > > > > Maciej Bryński wrote > > Hi Everyone, > > I'm trying to speed up my Spark streaming application and I have > following > > problem. > > I'm using a lot of joins in my app and full catalyst analysis is > triggered > > during every join. > > > > I found 2 options to speed up. > > > > 1) spark.sql.selfJoinAutoResolveAmbiguity option > > But looking at code: > > https://github.com/apache/spark/blob/8cd9cdf17a7a4ad6f2eecd7c4b388c > a363c20982/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L918 > > > > Shouldn't lines 925-927 be before 920-922 ? > > > > 2) https://issues.apache.org/jira/browse/SPARK-20392 > > > > Is it safe to use it on top of 2.2.0 ? > > > > Regards, > > -- > > Maciek Bryński > > > > > > ----- > Liang-Chi Hsieh | @viirya > Spark Technology Center > http://www.spark.tc/ > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Speeding-up- > Catalyst-engine-tp22013p22014.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Maciek Bryński