openinx commented on pull request #3073:
URL: https://github.com/apache/iceberg/pull/3073#issuecomment-914997015


   @WinkerDu I definitely agreed the v2 bin-pack algorithm should be improved 
for v2 to consider the total size of insert & delete files.  I think the 
`iterms-per-bin` proposed from you team is trying to resolve the unbalanced 
issue,  but I'm concerning it's hard to set the correct `iterms-per-bin` value 
for a given table in real production environment,  because the `iterms-per-bin` 
is still controlling the data file's count.  We actually don't have a real 
suitable approach to evaluate the cost about joining the data file size & its 
delete records.  I think we need more accurate approach to decide which scan 
tasks should be dispatched to different tasks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to