maytasm opened a new pull request #12125:
URL: https://github.com/apache/druid/pull/12125


   Support adding metrics in Auto Compaction
   
   ### Description
   
   This PR adds `metricsSpec` to auto compaction. Note that `metricsSpec` field 
already exists for manual compaction task.
   By setting `metricsSpec` in auto compaction, the user will be able to add 
new metric columns to all compacted data for the datasource.
   
   This PR intentionally omits any documentation as this feature ('metricsSpec` 
for Auto compaction) is still incomplete. In the current state, it is not very 
useful as auto compaction will not be able to deal with segments that already 
has the metric (in the metricsSpec) or if the source dimension is dropped when 
compaction runs for the same time chunk again. Basically, the only case that 
this feature (in current state) would be useful is if once the metrics are 
added, the data for the compacted time chunk does not change (no new segment 
appended). A follow up PR will solve this problem and add guardrails for cases 
that will run into unexpected results.  
   
   Note that this PR also change one undocumented behavior of compaction task. 
Before this PR, any metricsSpec given to the compaction task will be converted 
into the combining factories. This was done to reduce human error. For example, 
suppose the user put a count metric in the compaction task. We assumed that 
what the user really want is probably counting the number of original rows 
before rollup instead of counting the rows after rollup and hence the combining 
factory for count is longSum, so we automatically convert the count aggregator 
to use the longSum instead. However, this prevent any new metrics being added 
to the compacted data as the combining factory will only combine the same 
metric and cannot create new metric. This PR removes this assumption and uses 
the metricsSpec provided as-is. This functionality is not documented and no one 
is probably using it since you can achieve the same result by not providing the 
metricsSpec. 
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [x] added integration tests.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to