ektravel commented on code in PR #12549: URL: https://github.com/apache/druid/pull/12549#discussion_r1099182443
########## docs/querying/sql-data-types.md: ########## @@ -80,8 +82,42 @@ the `UNNEST` functionality available in some other SQL dialects. Refer to the do > they are handled in Druid SQL and in native queries. For example, > expressions involving multi-value dimensions may be > incorrectly optimized by the Druid SQL planner: `multi_val_dim = 'a' AND > multi_val_dim = 'b'` will be optimized to > `false`, even though it is possible for a single row to have both "a" and > "b" as values for `multi_val_dim`. The -> SQL behavior of multi-value dimensions will change in a future release to more closely align with their behavior -> in native queries. +> SQL behavior of multi-value dimensions may change in a future release to more closely align with their behavior +> in native queries, but the [multi-value string functions](./sql-multivalue-string-functions.md) should be able to provide +> nearly all possible native functionality. + +## Arrays +Multi-value dimensions may also be converted to standard SQL arrays, either by explicitly converting them with `MV_TO_ARRAY`, +or implicitly when used within the [array functions](./sql-array-functions.md). `ARRAY` types behave as standard SQL arrays, where +grouping on them will group on the entire array of values instead of the implicit `UNNEST` that occurs when grouping on +multi-value dimensions directly or when used with the multi-value functions. Arrays may also be constructed from multiple +columns using the array functions. + +## Multi-value strings behavior +The behavior of Druid [multi-value string dimensions](multi-value-dimensions.md) varies depending on the context of their usage. + +When used as `VARCHAR` functions, which are not "aware" that their inputs which claim to be `VARCHAR` might actually have multiple +values such as `CONCAT`, Druid will map the function across all values in the row. If the row is null or empty, the function will +recieve `NULL` as its input, otherwise it will be applied to every row value and continue its life as a multi-value VARCHAR. + +When used with the explicit [multi-value string functions](./sql-multivalue-string-functions.md), the column is acknowledged to be multi-valued, +and during processing the values are operated on as if they were `ARRAY` typed, so any operations which produce null and empty rows are +distinguished as separate values (unlike implicit mapping behavior), but retain their `VARCHAR` type after the computation is complete. +Note that Druid multi-value columns do _not_ distinguish between empty and null rows, so an empty row will never appear natively as input Review Comment: ```suggestion Note that Druid multi-value columns do not distinguish between empty and null rows. An empty row never appears natively as an input to a multi-value function, but a multi-value function that manipulates the array form of the value may produce an empty array, which is handled separately while processing. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
