clintropolis commented on pull request #9638: URL: https://github.com/apache/druid/pull/9638#issuecomment-680417441
> Hmm, in your patch would anything currently return one of the new ValueTypes? If so, we could check that we have some tests that use a subquery that now reports one of the new ValueTypes in a signature, and make sure that the outer query still runs properly. A handful of post-aggregators are currently the only thing that can return double array types. While adding synthetic tests for what happens in various scenarios when these post-aggregators appear as part of a subquery (the subquery row signature/inline datasource contains an array type) uncovered a strange difference in behavior. Prior to this patch, grouping on these array columns would result in them being treated as `STRING` typed columns, and then they would follow multi-value string grouping behavior after first being converted to a String list with `Rows.objectToStrings`. I don't think this was particularly good or intuitive behavior, of first converting numbers to strings, and then using the auto un-nesting grouping behavior to turn the results into a string column, so I consider grouping on these arrays being an exception (until we add explicit handling for grouping on arrays on array equality to be consistent with SQL arrays) is an improvement. The tests added to `ClientQuerySegmentWalkerTest` illustrate the prior behavior with the array types in the `RowSignature ` as `null`, and then tests which expect exceptions for the new behavior where the array types will be correctly typed in the `RowSignature`. I looked over the `AggregatorFactory` implementation handling of input column `ValueType` and couldn't spot anywhere that having the new array types (or a complex type that was previously `null`) would cause a difference in behavior, but there is perhaps a minor risk there in case I missed anything. Exhaustively testing every aggregator implementation seems like maybe a lot of work, though it also sounds kind of nice. Maybe a follow-up PR would be ok, and it might make sense to hold off until after array support is a bit further enhanced/integrated with the engines? Its worth noting that I think this is currently only possible with _native_ JSON subqueries. All of the post-aggregator implementations which can produce arrays are typed as `SqlTypeName.OTHER` which in all of my experiments caused the query to fail to plan if it was used in a grouping from a subquery. (This could probably be improved by typing these as SQL arrays once we support array grouping). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
