kbendick commented on PR #4596: URL: https://github.com/apache/iceberg/pull/4596#issuecomment-1125713285
Is there a reason that the files have to be 1MB? That’s very very small. You should also consider using Avro for storage at that size. But I think if you compacted files you probably wouldn’t have such issues during scan planning. For ingest, sometimes we can’t avoid that small of files, I know. But scanning that many small files is counterintuitive for most cases. Ideally files that small (possibly coming from a 3rd party or a very very sharded system) would then be ingested into a table that is more ideally tailored to you query needs. Do you run table maintenance actions on your table @uncleGen? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
