amogh-jahagirdar commented on issue #7071:
URL: https://github.com/apache/iceberg/issues/7071#issuecomment-1465034853

   Thanks for bringing back this discussion @singhpk234 , I think it makes 
sense to have the table format itself be able to help determine optimal split 
sizes because the table format has the statistics to determine good values for 
a given table.
   
   I think we just need to define what the user experience + engine integration 
experience should look like considering `read.split.target-size` is already 
defined with a default of 128mb. One simple approach that comes to mind is 
define another table property `read.split.auto-size` which defaults to false. 
If it's set to true, then `read.split.target-size` should be ignored by the 
engine, and the engine takes the responsibility for determining optimal split 
size given the table stats. In the iceberg library there could be a utility for 
a recommended size based on table stats, and the engine just delegates to the 
library if they want to use that or they can override if there's something 
better for the engine. Over time, as we get confidence in the auto-size 
behavior for different engines we can make it default to true. Just my initial 
thoughts, open to others ideas here as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to