clintropolis commented on issue #7588: multi-value string column support for 
expressions
URL: https://github.com/apache/incubator-druid/pull/7588#issuecomment-502293675
 
 
   I would like to do ingestion side support through transforms as a follow-up 
PR, I think maybe it could benefit from a slight refactoring and this PR is 
already big enough.
   
   This PR should have no effect on multi-value dimensions used in transform 
spec, because it appears to already behave differently than the expression 
selectors in that instead of array values being translated to `null` for the 
expression they are instead fed into `String.valueOf` so they appear as a 
scalar string. As expected, using the array functions from this PR at ingestion 
time can have unexpected results because of the scalar string behavior and also 
lack of array result to list of string coercion on the expression outputs, but 
because of the string behavior it appears that this is also true of any 
expression on multi-value ingestion time inputs.
   
   I think my main outstanding question for this PR is whether or not this new 
expression selector behavior should be optional for now, because there is a 
potential performance impact for expression selectors with bindings from row 
based selectors since they will have to use 
`RowBasedExpressionColumnValueSelector` with it's per row check of input 
binding type. I do hope to eliminate nearly all of these cases in future work 
by enhancing the type information throughout the query pipeline, but i'm not 
certain how far in the future that actually is and I haven't had the chance to 
conduct thorough measurements on performance difference, which is why I brought 
this up. Also I could see a use for turning off this behavior if certain there 
are no multi-value dimensions. On the other hand though, we have a lot of 
options, I'm not sure we need more. Maybe some of the improvements done in this 
PR, like argument validation for functions being done at parse time instead of 
for each eval, make up enough for the `RowBasedExpressionColumnValueSelector` 
phase that it about evens out. Thoughts?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to