zohar-plutoflume opened a new issue, #7998: URL: https://github.com/apache/iceberg/issues/7998
### Query engine spark 3.3.1 iceberg 1.1 (emr 6.10) ### Question Hi, wanted to point to something that was introduced in spark 3.3, https://issues.apache.org/jira/browse/SPARK-38148 , the issue with this is that when we are using a merge command and in case we have a static partition as part of the command, this translates to a join query which does not utilise dynamic filtering and caused our jobs to run much slower. for example this merge into command: f""" MERGE INTO {output_table.catalog_table_ref} {TARGET} USING {TMP_VIEW} {SOURCE} ON {join_str} AND {TARGET}.triggered is true WHEN MATCHED THEN UPDATE SET {update_col_string} """ if we were to use spark joins directly we could just filter on the target table , but as we use the merge api we need to provide the iceberg target so we have to add the static partition condition as part of the on. one option is putting it in WHEN MATCHED AND {TARGET}.triggered is true , but I think it will not push down the triggered = true to the target table. any suggestion of how to make the merge use the dynamic filtering like before the spark 3.3.1 upgrade would be very appreciated. I'm thinking if its a spark issue, where they need to introduce a conf which will decide if to drop the dynamic partitioning or not, or if its an iceberg issue cause its the merge into api. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
