abstractdog commented on PR #5613: URL: https://github.com/apache/hive/pull/5613#issuecomment-2618075683
> since it's an iceberg table, should we compact it (could have a huge amount of small files over time)? theoretically, yes, it can have many small files eventually, that's what I tried to balance with the different flushing strategies my final decision was to not take care of this (which became more valid as we disabled this service by default) in the scope of this jira, because there were challenges that I didn't want to cope with now, considerations were: 1. the platform (any kind of downstream data platform having hive) might want to take care of compacting this iceberg table automatically, or the user can do it (thank god it's an iceberg table so that any compaction implementation can be utilized here that we already have) 2. the platform I'm working on is like this (you know it I think, this explanation is for the wider audience): HS2 is the permanent component (where this service runs), and the expensive compute nodes are ephemeral/autoscaled, so whether to run compaction - that spin up computes - looked like a decision I didn't want to make this time: not to mention on that particular platform, it's told to be a separate component taking care of compacting tables (in the future/present, IDK) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
