clintropolis commented on issue #8970: URL: https://github.com/apache/druid/issues/8970#issuecomment-829762963
Ok, so it appears the original issue described in this ticket is no longer an issue, I added test cases in current master branch and they all pass when backed with both in memory realtime segments and memory mapped historical segments. I can however reproduce your example of using an 'if' statement on a string column in a sum aggregator with realtime segments, which is a slightly different (but very similar) issue and a bug in the current version. The cause of this bug is that for realtime data, since all string columns in Druid could potentially become multi-valued string columns if a new row is added during ingestion processing, callers cannot be certain that your 'errorType' column is not a multi-valued string dimension. So to be safe, the value selector rewrites the expression to something like ``` map((errorType) -> if (errorType == 'DNS', 1, 0), errorType) ``` which ends up producing `[1]` or `[0]` instead of `1` or `0`. This is fine when being used as a selector, because currently all multi-valued outputs are coerced back to string values when aggregated on in a group by or topN, but as an input to aggregators that are not expecting to handle multi-valued input, they do not know what to do with them and so they are treated as 0. I should be able to have a fix for this sometime soon, and will be sure to get it in by the next release (0.22). Thanks for the report! (and apologies for the bug) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
