clintropolis commented on code in PR #14792:
URL: https://github.com/apache/druid/pull/14792#discussion_r1294271813


##########
integration-tests-ex/cases/src/test/resources/multi-stage-query/wikipedia_merge_index_queries.json:
##########
@@ -34,8 +34,8 @@
       "timestamp" : "2013-08-31T00:00:00.000Z",
       "event" : {
         "continent":"Asia",
-        "earliest_user":"masterYi",
-        "latest_user":"stringer"
+        "earliest_user":null,
+        "latest_user":null

Review Comment:
   it turns out this is actually a bug in the string first/last aggregators, 
which are incorrectly checking timeSelector.isNull prior to doing the 'fold' 
check. The timeSelector isn't really used when merging pairs from the selector 
because the time is embedded in the pair of selector in that case, so 
timeSelector here can spit out nulls and incorrectly aggregate nothing when it 
should be aggregating pairs.
   
   I'll do a short term fix, but longer term we should really probably split 
out the agg implementations that handle the raw values ("build") from the aggs 
that handle pairs ("merge") to avoid these mistakes and simplify the code.
   
   There are also some flaws in the vector aggregator for string last, which is 
not checking for nulls at all on the time column, which while avoiding this 
bug, does have other bugs in cases where the time column could legitimately be 
null, such as from a virtual column.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to