GWphua opened a new pull request, #19016: URL: https://github.com/apache/druid/pull/19016
Fixes #9712 ### Description This PR adds support for minor compaction: the ability to compact a specific subset of segments within a time chunk rather than all segments in the interval. Previously, submitting a compaction task with `SpecificSegmentsSpec` (segment IDs) would cause Druid to lock, read, and rewrite all segments in the umbrella interval, defeating the purpose of targeting specific segments. The core problem spans multiple layers of the compaction pipeline: 1. Locking: `CompactionTask#findSegmentsToLock` and all sub-task `findSegmentsToLock()` methods retrieve every segment in the umbrella interval via RetrieveUsedSegmentsAction, meaning the task acquires locks far broader than necessary. 2. Input resolution: `NativeCompactionRunner#createIoConfig` always passes null for segmentIds to DruidInputSource, so the input source reads the full interval regardless of the input spec. 3. Timeline lookup: `retrieveRelevantTimelineHolders()` uses `SegmentTimeline.lookup()` which requires ONLY_COMPLETE partitions... a filtered subset of segments appears incomplete and will be silently excluded. 4. Validation: `CompactionTask.SegmentProvider#checkSegments` with `TIME_CHUNK` lock granularity delegates to `SpecificSegmentsSpec.validateSegments()` which requires an exact match between the spec's segments and all segments in the interval. This guarantees a failure when we give any proper subset of segments in the interval. ### Changes and Explanations #### dropExisting conflict guard A constructor-level validation in CompactionTask now rejects the combination of SpecificSegmentsSpec with dropExisting = true, since dropExisting semantics replace all segments in the interval — directly contradicting minor compaction intent. <hr> #### Segment filtering in lock acquisition `CompactionTask.findSegmentsToLock()` now filters the result of `RetrieveUsedSegmentsAction` to only the segment IDs present in `SpecificSegmentsSpec`. The same filtering is applied in `IndexTask`, `ParallelIndexSupervisorTask`, and `SinglePhaseSubTask` via `CTX_KEY_SPECIFIC_SEGMENTS_TO_COMPACT` propagated from `NativeCompactionRunner#createContextForSubtask()`. This follows the existing pattern of passing compaction metadata through `CTX_KEY_APPENDERATOR_TRACKING_TASK_ID`. <hr> #### `CompactionTask.SegmentProvider` caches for TIME_CHUNK granularity in `checkSegments()` The intuition behind this approach is: 1. `SegmentProvider#findSegments` is first being called, followed by `SegmentProvider#checkSegments`. 2. When `findSegments` is called, we do not know which lock granularity is being used. 3. Time granularity requires all segments in the interval, while segment granularity requires only the input segments 4. Save all segments in interval in `findSegments` as `allSegmentsInInterval`, then later use this field when we encounter a TIME_CHUNK lock granularity. Honestly, I am not too satisfied with how I approached this problem, owing to the fact that developers now need to keep a temporal relationship between `findSegments` and `checkSegments`. Would love to hear about any alternatives to this problem! <hr> #### Segment-ID-based input for DruidInputSource `NativeCompactionRunner#createIoConfig` now detects `SpecificSegmentsSpec` and resolves the segment ID strings into `WindowedSegmentId` objects, passing them to `DruidInputSource` instead of the interval. `DruidInputSource` already supports this code path, but it was never wired up from the compaction side. <hr> #### Timeline lookup with incomplete partitions `retrieveRelevantTimelineHolders()` now calls `lookupWithIncompletePartitions()` (i.e. Partitions.INCOMPLETE_OK) when the input spec is `SpecificSegmentsSpec`. Without this, a filtered segment set that doesn't cover all partitions in the interval produces an empty timeline result and the compaction silently does nothing. <hr> #### MSQ engine rejection MSQ compaction is fundamentally incompatible with minor compaction: it forces dropExisting = true, uses REPLACE ingestion mode (which acquires TIME_CHUNK locks covering the full interval), and queries via MultipleIntervalSegmentSpec. A validation check is added in MSQCompactionRunner.validateCompactionTask() to reject SpecificSegmentsSpec with an explicit error message rather than failing in an opaque way downstream. #### Release note <!-- Give your best effort to summarize your changes in a couple of sentences aimed toward Druid users. If your change doesn't have end user impact, you can skip this section. For tips about how to write a good release note, see [Release notes](https://github.com/apache/druid/blob/master/CONTRIBUTING.md#release-notes). --> Compaction tasks using SpecificSegmentsSpec (segment ID list) now correctly compact only the specified segments instead of all segments in the umbrella interval. This new feature is unsupported in MSQ. <hr> ##### Key changed/added classes in this PR * `CompactionTask ` * `NativeCompactionRunner ` * `IndexTask ` * `ParallelIndexSupervisorTask` * `ParallelIndexSupervisorTask` * `SinglePhaseSubTask` * `MSQCompactionRunner` * `CompactionTaskTest` / `TaskLockHelperTest` <hr> <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist below are strictly necessary, but it would be very helpful if you at least self-review the PR. --> This PR has: - [x] been self-reviewed. - [ ] added documentation for new or modified features or behaviors. - [x] a release note entry in the PR description. - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [x] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md) - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [x] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
