[GitHub] [iceberg] aokolnychyi commented on pull request #7430: Allow sparksql to override target split size with session property

via GitHub Thu, 27 Apr 2023 16:43:58 -0700


aokolnychyi commented on PR #7430:
URL: https://github.com/apache/iceberg/pull/7430#issuecomment-1526779646


   Can we identify exact scenarios when the default split size performs poorly 
and check if we can solve the underlying problem? For instance, if the 
scheduler is FIFO, can we use the default cluster parallelism and the size of 
the data to be processed to come up with an optimal split size? We first find 
matching files and then plan splits so the split size can be dynamic, we just 
need a good way to estimate it correctly.
   
   I am not going to oppose a SQL config but I don't think we should rely on an 
internal SQL property for built-in file sources.
   
   Thoughts, @puchengy @RussellSpitzer @szehon-ho @singhpk234 @rdblue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on pull request #7430: Allow sparksql to override target split size with session property

Reply via email to