maytasm opened a new pull request #10843:
URL: https://github.com/apache/druid/pull/10843
Support segmentGranularity for auto-compaction
### Description
The auto-compaction configuration should support a segmentGranularity option
like manual compaction task.
- Storing granularitySpec in CompactionState. This is so that auto
compaction can determine if compaction needs to be done or not (when
segmentGranularity in auto compaction config changes)
- Adds granularitySpec inDataSourceCompactionConfig. This allows
segmentGranularity to be set in auto compaction config. Currently, only
segmentGranularity of granularitySpec is supported by auto compaction and
setting other values (i.e. queryGranularity) will fail.
- In CompactSegments, pass granularitySpec to
indexingServiceClient.compactSegments. This is to pass the segmentGranularity
value inside granularitySpec into the ingestionSpec of the compact task. Note
that we also add granularitySpec field in CompactionTask and deprecates
segmentGranularity field in CompactionTask. This is so that in the future we
can support queryGranularity and rollup in compaction task. As a result, in
CompactionTask we uses the segmentGranularity inside granularitySpec instead of
the deprecated top level segmentGranularity field.
- needsCompaction in NewestSegmentFirstIterator needs to take into account
changing granularitySpec. We will have to compact segments when
segmentGranularity changes.
- When segmentGranularity changes, we also cancel any active compaction task
that have different segmentGranularity and re-compact those intervals with the
new segmentGranularity.
- NewestSegmentFirstIterator will also possibly return multiple segments of
the same bucket for the new segmentGranularity (if segmentGranularity given).
For example, if the original segment granularity is day then
NewestSegmentFirstIterator currently will return a single day for each
iteration. But if the new segmentGranularity in the auto compaction config is
larger such as week, we cannot just compact a single day with segment
granularity = week (Basically we cannot send a compact task with interval of a
single day). Hence, the NewestSegmentFirstIterator will have to take into
account the new segmentGranularity (week) and combine 7 days and return a
period of seven days.
This PR has:
- [ ] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
##### Key changed/added classes in this PR
* CompactSegments
* NewestSegmentFirstIterator
* DataSourceCompactionConfig
* CompactionState
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]