[GitHub] [iceberg] kbendick commented on pull request #4596: Use bounded queue to avoid consuming too much memory

GitBox Thu, 12 May 2022 23:49:07 -0700


kbendick commented on PR #4596:
URL: https://github.com/apache/iceberg/pull/4596#issuecomment-1125713285


   Is there a reason that the files have to be 1MB? That’s very very small. You 
should also consider using Avro for storage at that size. But I think if you 
compacted files you probably wouldn’t have such issues during scan planning.
   
   For ingest, sometimes we can’t avoid that small of files, I know. But 
scanning that many small files is counterintuitive for most cases. Ideally 
files that small (possibly coming from a 3rd party or a very very sharded 
system) would then be ingested into a table that is more ideally tailored to 
you query needs.
   
   Do you run table maintenance actions on your table @uncleGen?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on pull request #4596: Use bounded queue to avoid consuming too much memory

Reply via email to