[GitHub] [iceberg] dramaticlly opened a new issue, #6741: Support write distribution mode as a Spark SqlConf option in Iceberg

via GitHub Fri, 03 Feb 2023 14:55:08 -0800


dramaticlly opened a new issue, #6741:
URL: https://github.com/apache/iceberg/issues/6741


   ### Feature Request / Improvement
   
   Today, the row level deletion, update for iceberg table have to be done via 
Spark SQL. 
   
   Currently iceberg provides table properties to configure the write 
distribution mode (none, hash and range) but we love to see the ability to 
configure this on per Spark job level 
   
   Reasoning
   1. iceberg is now plan to default write distribution mode from none to range 
https://github.com/apache/iceberg/issues/6679
   1. `None` as write distribution mode used to minimize the shuffle for 
already partition aligned data usually only needed to set for GDPR like 
deletion job but not necessarily needed for other jobs. So set it on table 
properties for all write/delete/update seems like not not the best idea.
   
   example SQL usecase can be found
   ```sql
   DELETE FROM tbl1
   WHERE date <= '20230101'
   AND external_id IN (SELECT id FROM tb2) 
   ```
   
   ### Query engine
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] dramaticlly opened a new issue, #6741: Support write distribution mode as a Spark SqlConf option in Iceberg

Reply via email to