aokolnychyi commented on PR #7430:
URL: https://github.com/apache/iceberg/pull/7430#issuecomment-1528008416

   > Do you mean you are fine with the current change but against making it 
correlate to "spark.sql.files.maxPartitionBytes"?
   
   Yeah, I don't mind adding an Iceberg SQL property if it benefits you and 
other folks also support it but I would like to think through other 
alternatives to make sure we are not overlooking a better approach. I don't 
think it is a good idea to support any properties for built-in sources, though. 
Split planning is a bit different. If we support one config, will we have to 
support others?
   
   > this will lead to driver memory consumption increased and cause job driver 
OOM. To fix that, we will have to increase driver memory manually for that job.
   
   Is there a way to set this value correctly during the migration or is the 
split size different for different workloads?
   
   > To fix that, we have to add explicit coalesce to match the behavior.
   
   I assume there is no shuffle in that read-write job so that AQE cannot 
coalesce/split tasks during writes?
   
   Let's hear what others think. I am OK to add this property to unblock you 
but it would be great to explore the automatic split configuration. I created 
#7465 for that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to