dramaticlly opened a new issue, #5224: URL: https://github.com/apache/iceberg/issues/5224
Hey Iceberg Community: we recently migrated from using iceberg 13 with Spark 3.1 to Spark 3.2 and realized a some existing SQL delete job are producing a lot more shuffling data than it was in spark 3.1, when explain the SQL statement with logical plan, we realized the https://github.com/apache/iceberg/blob/master/spark/v3.1/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/DynamicFileFilter.scala is missing from the Spark 3.2 extensions and want some help to understand why. Looks like dynamic file filter was introduced https://github.com/apache/iceberg/pull/3415/ on 10/31/2021 and initial spark 3.2 support was merged in https://github.com/apache/iceberg/pull/3335/files on 10/22/2021, so want to check if there's any time implication delete SQL ```sql DELETE FROM $table1 WHERE $table1.date <= '20211228' AND $table1.date >= '20220627' AND upper($table1.$column1) IN (SELECT * FROM $table2) ``` Spark logic plan screenshot  Iceberg Version: 0.13.0 (did not turn on merge-on-read for this yet) Spark Version: 3.2.0 (too many shuffle data) vs 3.1.1(works as expected with DynamicFileFilter) Appreciate your help! CC @szehon-ho @rdblue @aokolnychyi @wypoon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
