clintropolis commented on code in PR #14792:
URL: https://github.com/apache/druid/pull/14792#discussion_r1294271813
##########
integration-tests-ex/cases/src/test/resources/multi-stage-query/wikipedia_merge_index_queries.json:
##########
@@ -34,8 +34,8 @@
"timestamp" : "2013-08-31T00:00:00.000Z",
"event" : {
"continent":"Asia",
- "earliest_user":"masterYi",
- "latest_user":"stringer"
+ "earliest_user":null,
+ "latest_user":null
Review Comment:
it turns out this is actually a bug in the string first/last aggregators,
which are incorrectly checking timeSelector.isNull prior to doing the 'fold'
check. The timeSelector isn't really used when merging pairs from the selector
because the time is embedded in the pair of selector in that case, so
timeSelector here can spit out nulls and incorrectly aggregate nothing when it
should be aggregating pairs.
I'll do a short term fix, but longer term we should really probably split
out the agg implementations that handle the raw values ("build") from the aggs
that handle pairs ("merge") to avoid these mistakes and simplify the code.
There are also some flaws in the vector aggregator for string last, which is
not checking for nulls at all on the time column, which while avoiding this
bug, does have other bugs in cases where the time column could legitimately be
null, such as from a virtual column.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]