Mathieu DESPRIEE created SPARK-23220: ----------------------------------------
Summary: broadcast hint not applied in a streaming left anti join Key: SPARK-23220 URL: https://issues.apache.org/jira/browse/SPARK-23220 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.2.1 Reporter: Mathieu DESPRIEE Attachments: Screenshot from 2018-01-25 17-32-45.png We have a structured streaming app doing a left anti-join between a stream, and a static dataframe. This one is quite small (a few 100s of rows), but he query plan by default is a sort merge join. It happens sometimes we need to re-process some historical data, so we feed the same app with a FileSource pointing to our S3 storage with all archives. In that situation, the first mini-batch is quite heavy (several 100'000s of input files), and the time spent in sort-merge join is non-acceptable. I tried to switch to a broadcast join, but Spark still apply a sort-merge. {noformat} ds.join(broadcast(hostnames), Seq("hostname"), "leftanti") {noformat} Looks like a bug. Is there another way to force the broadcast ? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org