[GitHub] [iceberg] puchengy commented on pull request #7430: Allow sparksql to override target split size with session property

via GitHub Fri, 28 Apr 2023 13:07:54 -0700


puchengy commented on PR #7430:
URL: https://github.com/apache/iceberg/pull/7430#issuecomment-1528036670


   @aokolnychyi Thanks for your understanding.
   
   > Split planning is a bit different. If we support one config, will we have 
to support others?
   Unfortunately, that is the case. But we don't have to proactively pull in 
other configs if no one need.
   
   > Is there a way to set this value correctly during the migration or is the 
split size different for different workloads?
   Yes, there is a way, but to intelligently automate this will need more work 
(which is why I am trying to explore this possibility). Also higher driver 
memory means more resource usage, this will lead to additional layer of 
complexity for user education (why it is ok to bump the memory and why the cost 
will not be high etc). 
   
   I haven't seen a case where split size is different for different workloads, 
but I am not surprised if there is since in our platform, customers are allowed 
to set any configs they would like. 
   
   > I assume there is no shuffle in that read-write job so that AQE cannot 
coalesce/split tasks during writes?
   Yes, you are correct.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] puchengy commented on pull request #7430: Allow sparksql to override target split size with session property

Reply via email to