maytasm opened a new pull request #10843:
URL: https://github.com/apache/druid/pull/10843


   Support segmentGranularity for auto-compaction
   
   ### Description
   
   The auto-compaction configuration should support a segmentGranularity option 
like manual compaction task.
   - Storing granularitySpec in CompactionState. This is so that auto 
compaction can determine if compaction needs to be done or not (when 
segmentGranularity in auto compaction config changes)
   - Adds granularitySpec inDataSourceCompactionConfig. This allows 
segmentGranularity to be set in auto compaction config. Currently, only 
segmentGranularity of granularitySpec is supported by auto compaction and 
setting other values (i.e. queryGranularity) will fail.
   - In CompactSegments, pass  granularitySpec to 
indexingServiceClient.compactSegments. This is to pass the segmentGranularity 
value inside granularitySpec into the ingestionSpec of the compact task. Note 
that we also add granularitySpec field in CompactionTask and deprecates 
segmentGranularity field in CompactionTask. This is so that in the future we 
can support queryGranularity and rollup in compaction task. As a result, in 
CompactionTask we uses the segmentGranularity inside granularitySpec instead of 
the deprecated top level segmentGranularity field.
   - needsCompaction in NewestSegmentFirstIterator needs to take into account 
changing granularitySpec. We will have to compact segments when 
segmentGranularity changes.
   - When segmentGranularity changes, we also cancel any active compaction task 
that have different segmentGranularity and re-compact those intervals with the 
new segmentGranularity. 
   - NewestSegmentFirstIterator will also possibly return multiple segments of 
the same bucket for the new segmentGranularity (if segmentGranularity given). 
For example, if the original segment granularity is day then 
NewestSegmentFirstIterator currently will return a single day for each 
iteration. But if the new segmentGranularity in the auto compaction config is 
larger such as week, we cannot just compact a single day with segment 
granularity = week (Basically we cannot send a compact task with interval of a 
single day). Hence, the NewestSegmentFirstIterator will have to take into 
account the new segmentGranularity (week) and combine 7 days and return a 
period of seven days.
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   ##### Key changed/added classes in this PR
    * CompactSegments
    * NewestSegmentFirstIterator
    * DataSourceCompactionConfig
    * CompactionState
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to