LakshSingla opened a new pull request, #15670:
URL: https://github.com/apache/druid/pull/15670
### Description
This PR:
* Enables MSQ tests on queries using EARLIEST/LATEST/EARLIEST_BY/LATEST_BY
aggregators
* Removes the limitations in the docs, since numeric first/last can now be
used with MSQ and at ingestion time
* Disallows EARLIEST_BY and LATEST_BY to be used with rolled-up
pairLongObjects. This is done to prevent the caller from supplying the
timeExpr, which will get ignored by the native engine, and might be unexpected
behavior. The correct way to further aggregate such columns would be using
EARLIEST/LATEST, where the caller understands that the time column would be
implicitly taken from the rolled-up metric. In the following example:
*
```sql
-- Insert data into a table using the following query. 'finalize' should be
false in the query context to enable rollup
INSERT INTO dim1 foo EARLIEST_BY(m1, timestampCol1) FROM EXTERN(...) GROUP
BY dim1
-- Rollup the pre-aggregated metric, with a different timestamp column
-- In such a case, the native aggregator will ignore the value from the
timestampCol2 and use the value that was aggregated during the ingestion. To
prevent such errors, the call is disallowed, and user friendly message is thrown
SELECT EARLIEST_BY(m1, timestampCol2) FROM foo
-- Rollup, with the column that was used during the ingestion time
SELECT EARLIEST(m1) FROM foo
```
First/Last aggregators call `.toString()` on complex metrics (that aren't
type of pairLongLong, pairLongString...) and array types, which is also weird,
however, that hasn't been changed, because that has been supported for a long
time, and is also documented implicitly.
Disallowing `EARLIEST_BY(aggregatedMetric, timestampCol2)` will call the
users to change their queries, however the equivalent call to this is
`EARLIEST(aggregatedMetric)`, which is a lot more clear, as the explicitly
typed column by the user isn't ignored.
#### Release note
EARLIEST_BY and LATEST_BY cannot be used with complex objects created during
ingestion (with rollup) with the first/last aggregators.
<hr>
##### Key changed/added classes in this PR
This PR has:
- [ ] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]