gianm commented on PR #14810:
URL: https://github.com/apache/druid/pull/14810#issuecomment-1680963514

   Adding a bit to the prior comment: personally I think the second approach, 
where we don't use `compact` tasks at all, is best. IMO, the ideal way to do it 
is something like this:
   
   First, incorporate enough metadata into the metadata store, such that 
compaction doesn't need to fetch segments from deep storage to figure out what 
to do. There are a couple of approaches we could take here, including a 
catalog-based approaches (where metadata is explicitly specified) or a 
metadata-stashing approach where we save segment row signatures in the metadata 
store when segments are published. This latter idea would be useful for a bunch 
of other reasons.
   
   Second, introduce a `VACUUM [table]` or `COMPACT [table]` command in SQL. It 
should take an optional interval and it should leverage the metadata from the 
first step in order to determine what to do.
   
   Third, have auto-compaction at the Coordinator issue one of these SQL 
commands.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to