[GitHub] [iceberg] zohar-plutoflume opened a new issue, #7998: performance degradation after migrating to spark 3.3.1 when using iceberg merge into

via GitHub Thu, 06 Jul 2023 04:26:13 -0700


zohar-plutoflume opened a new issue, #7998:
URL: https://github.com/apache/iceberg/issues/7998


   ### Query engine
   
   spark 3.3.1 
   iceberg 1.1
   (emr 6.10)
   
   ### Question
   
   Hi, wanted to point to something that was introduced in spark 3.3, 
https://issues.apache.org/jira/browse/SPARK-38148 , the issue with this is that 
when we are using a merge command and in case we have a static partition as 
part of the command, this translates to a join query which does not utilise 
dynamic filtering and caused our jobs to run much slower.
   for example this merge into command:
   f"""
           MERGE INTO {output_table.catalog_table_ref} {TARGET}
           USING {TMP_VIEW} {SOURCE}
           ON {join_str}
           AND {TARGET}.triggered is true
           WHEN MATCHED
           THEN UPDATE SET {update_col_string}
           """
   if we were to use spark joins directly we could just filter on the target 
table , but as we use the merge api we need to provide the iceberg target so we 
have to add the static partition condition as part of the on. 
   
   one option is putting it in WHEN MATCHED AND {TARGET}.triggered is true , 
but I think it will not push down the triggered = true to the target table.
   
   any suggestion of how to make the merge use the dynamic filtering like 
before the spark 3.3.1 upgrade would be very appreciated.
   
   I'm thinking if its a spark issue, where they need to introduce a conf which 
will decide if to drop the dynamic partitioning or not, or if its an iceberg 
issue cause its the merge into api.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] zohar-plutoflume opened a new issue, #7998: performance degradation after migrating to spark 3.3.1 when using iceberg merge into

Reply via email to