clintropolis commented on issue #8970:
URL: https://github.com/apache/druid/issues/8970#issuecomment-829762963


   Ok, so it appears the original issue described in this ticket is no longer 
an issue, I added test cases in current master branch and they all pass when 
backed with both in memory realtime segments and memory mapped historical 
segments.
   
   I can however reproduce your example of using an 'if' statement on a string 
column in a sum aggregator with realtime segments, which is a slightly 
different (but very similar) issue and a bug in the current version.
   
   The cause of this bug is that for realtime data, since all string columns in 
Druid could potentially become multi-valued string columns if a new row is 
added during ingestion processing, callers cannot be certain that your 
'errorType' column is not a multi-valued string dimension. So to be safe, the 
value selector rewrites the expression to something like
   
   ```
   map((errorType) -> if (errorType == 'DNS', 1, 0), errorType)
   ```
   which ends up producing `[1]` or `[0]` instead of `1` or `0`. This is fine 
when being used as a selector, because currently all multi-valued outputs are 
coerced back to string values when aggregated on in a group by or topN, but as 
an input to aggregators that are not expecting to handle multi-valued input, 
they do not know what to do with them and so they are treated as 0. I should be 
able to have a fix for this sometime soon, and will be sure to get it in by the 
next release (0.22).
   
   Thanks for the report! (and apologies for the bug)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to