[GitHub] [druid] clintropolis commented on pull request #9638: refactor internal type system

GitBox Tue, 25 Aug 2020 19:11:11 -0700


clintropolis commented on pull request #9638:
URL: https://github.com/apache/druid/pull/9638#issuecomment-680417441



   > Hmm, in your patch would anything currently return one of the new 
ValueTypes? If so, we could check that we have some tests that use a subquery 
that now reports one of the new ValueTypes in a signature, and make sure that 
the outer query still runs properly.
   
   A handful of post-aggregators are currently the only thing that can return 
double array types.
   
   While adding synthetic tests for what happens in various scenarios when 
these post-aggregators appear as part of a subquery (the subquery row 
signature/inline datasource contains an array type) uncovered a strange 
difference in behavior. Prior to this patch, grouping on these array columns 
would result in them being treated as `STRING` typed columns, and then they 
would follow multi-value string grouping behavior after first being converted 
to a String list with `Rows.objectToStrings`. I don't think this was 
particularly good or intuitive behavior, of first converting numbers to 
strings, and then using the auto un-nesting grouping behavior to turn the 
results into a string column, so I consider grouping on these arrays being an 
exception (until we add explicit handling for grouping on arrays on array 
equality to be consistent with SQL arrays) is an improvement. The tests added 
to `ClientQuerySegmentWalkerTest` illustrate the prior behavior with the array 
types in the `RowSignature
 ` as `null`, and then tests which expect exceptions for the new behavior where 
the array types will be correctly typed in the `RowSignature`.
   
   I looked over the `AggregatorFactory` implementation handling of input 
column `ValueType` and couldn't spot anywhere that having the new array types 
(or a complex type that was previously `null`) would cause a difference in 
behavior, but there is perhaps a minor risk there in case I missed anything. 
Exhaustively testing every aggregator implementation seems like maybe a lot of 
work, though it also sounds kind of nice. Maybe a follow-up PR would be ok, and 
it might make sense to hold off until after array support is a bit further 
enhanced/integrated with the engines?
   
   Its worth noting that I think this is currently only possible with _native_ 
JSON subqueries. All of the post-aggregator implementations which can produce 
arrays are typed as `SqlTypeName.OTHER` which in all of my experiments caused 
the query to fail to plan if it was used in a grouping from a subquery. (This 
could probably be improved by typing these as SQL arrays once we support array 
grouping).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] clintropolis commented on pull request #9638: refactor internal type system

Reply via email to