abstractdog commented on PR #5613:
URL: https://github.com/apache/hive/pull/5613#issuecomment-2618075683

   > since it's an iceberg table, should we compact it (could have a huge 
amount of small files over time)?
   
   theoretically, yes, it can have many small files eventually, that's what I 
tried to balance with the different flushing strategies
   my final decision was to not take care of this (which became more valid as 
we disabled this service by default) in the scope of this jira, because there 
were challenges that I didn't want to cope with now, considerations were:
   
   1. the platform (any kind of downstream data platform having hive) might 
want to take care of compacting this iceberg table automatically, or the user 
can do it (thank god it's an iceberg table so that any compaction 
implementation can be utilized here that we already have)
   
   2. the platform I'm working on is like this (you know it I think, this 
explanation is for the wider audience): HS2 is the permanent component (where 
this service runs), and the expensive compute nodes are ephemeral/autoscaled, 
so whether to run compaction - that spin up computes - looked like a decision I 
didn't want to make this time: not to mention on that particular platform, it's 
told to be a separate component taking care of compacting tables (in the 
future/present, IDK)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to