clintropolis opened a new pull request, #14236:
URL: https://github.com/apache/druid/pull/14236

   ### Description
   This function adds a new `ARRAY_TO_MV` function which provides the opposite 
behavior of `MV_TO_ARRAY` to allow coercing any type of `ARRAY` into a plain 
`VARCHAR`. The intention of this function is to assist in the process of 
migrating tables that currently use native multi-value `STRING` columns to 
`ARRAY<STRING>`. 
   
   `ARRAY_TO_MV` allows cluster operators to begin ingesting true native 
`ARRAY` type columns, but can use this function to wrap these array types to 
use as a crutch to make them exhibit classic multi-value dimension (implicit 
unnest when grouping, etc) so that applications using Druid can continue with 
their current behavior. As the apps are updated to properly begin querying 
these columns with `ARRAY` semantics, existing MVD columns can instead be 
switched to use `MV_TO_ARRAY`, which has been modified to now accept `ARRAY` 
types to pass through as `ARRAY<STRING>`. Note that I didn't change the 
`MV_TO_ARRAY` of casting to `ARRAY<STRING>`, so only MVDs and the 
`ARRAY<STRING>` type they have become should be wrapped with this function. 
Other types of `ARRAY` when used with `MV_TO_ARRAY` will be coerced to 
`ARRAY<STRING>`, which likely isn't desirable behavior.
   
   Right now `ARRAY_TO_MV` exists purely as an expression virtual column, as a 
follow-up I will be adding a dedicated virtual column for both `ARRAY_TO_MV` 
and `MV_TO_ARRAY` so that coercing in either direction can be nearly as 
performant as the native column usage, including supporting indexes for fast 
filtering. Making migration not come with a penalty which is very important 
longer term.
   
   Deferring release notes specifically for this in favor of this being part of 
a larger section on mvd -> array migration in 27 release.
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to