maytasm opened a new pull request #12125:
URL: https://github.com/apache/druid/pull/12125
Support adding metrics in Auto Compaction
### Description
This PR adds `metricsSpec` to auto compaction. Note that `metricsSpec` field
already exists for manual compaction task.
By setting `metricsSpec` in auto compaction, the user will be able to add
new metric columns to all compacted data for the datasource.
This PR intentionally omits any documentation as this feature ('metricsSpec`
for Auto compaction) is still incomplete. In the current state, it is not very
useful as auto compaction will not be able to deal with segments that already
has the metric (in the metricsSpec) or if the source dimension is dropped when
compaction runs for the same time chunk again. Basically, the only case that
this feature (in current state) would be useful is if once the metrics are
added, the data for the compacted time chunk does not change (no new segment
appended). A follow up PR will solve this problem and add guardrails for cases
that will run into unexpected results.
Note that this PR also change one undocumented behavior of compaction task.
Before this PR, any metricsSpec given to the compaction task will be converted
into the combining factories. This was done to reduce human error. For example,
suppose the user put a count metric in the compaction task. We assumed that
what the user really want is probably counting the number of original rows
before rollup instead of counting the rows after rollup and hence the combining
factory for count is longSum, so we automatically convert the count aggregator
to use the longSum instead. However, this prevent any new metrics being added
to the compacted data as the combining factory will only combine the same
metric and cannot create new metric. This PR removes this assumption and uses
the metricsSpec provided as-is. This functionality is not documented and no one
is probably using it since you can achieve the same result by not providing the
metricsSpec.
This PR has:
- [x] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [x] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [x] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [x] added integration tests.
- [x] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]