gianm commented on PR #14810: URL: https://github.com/apache/druid/pull/14810#issuecomment-1680963514
Adding a bit to the prior comment: personally I think the second approach, where we don't use `compact` tasks at all, is best. IMO, the ideal way to do it is something like this: First, incorporate enough metadata into the metadata store, such that compaction doesn't need to fetch segments from deep storage to figure out what to do. There are a couple of approaches we could take here, including a catalog-based approaches (where metadata is explicitly specified) or a metadata-stashing approach where we save segment row signatures in the metadata store when segments are published. This latter idea would be useful for a bunch of other reasons. Second, introduce a `VACUUM [table]` or `COMPACT [table]` command in SQL. It should take an optional interval and it should leverage the metadata from the first step in order to determine what to do. Third, have auto-compaction at the Coordinator issue one of these SQL commands. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
