Hi Everyone, I'm trying to speed up my Spark streaming application and I have following problem. I'm using a lot of joins in my app and full catalyst analysis is triggered during every join.
I found 2 options to speed up. 1) spark.sql.selfJoinAutoResolveAmbiguity option But looking at code: https://github.com/apache/spark/blob/8cd9cdf17a7a4ad6f2eecd7c4b388ca363c20982/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L918 Shouldn't lines 925-927 be before 920-922 ? 2) https://issues.apache.org/jira/browse/SPARK-20392 Is it safe to use it on top of 2.2.0 ? Regards, -- Maciek Bryński