clintropolis opened a new pull request, #14236:
URL: https://github.com/apache/druid/pull/14236
### Description
This function adds a new `ARRAY_TO_MV` function which provides the opposite
behavior of `MV_TO_ARRAY` to allow coercing any type of `ARRAY` into a plain
`VARCHAR`. The intention of this function is to assist in the process of
migrating tables that currently use native multi-value `STRING` columns to
`ARRAY<STRING>`.
`ARRAY_TO_MV` allows cluster operators to begin ingesting true native
`ARRAY` type columns, but can use this function to wrap these array types to
use as a crutch to make them exhibit classic multi-value dimension (implicit
unnest when grouping, etc) so that applications using Druid can continue with
their current behavior. As the apps are updated to properly begin querying
these columns with `ARRAY` semantics, existing MVD columns can instead be
switched to use `MV_TO_ARRAY`, which has been modified to now accept `ARRAY`
types to pass through as `ARRAY<STRING>`. Note that I didn't change the
`MV_TO_ARRAY` of casting to `ARRAY<STRING>`, so only MVDs and the
`ARRAY<STRING>` type they have become should be wrapped with this function.
Other types of `ARRAY` when used with `MV_TO_ARRAY` will be coerced to
`ARRAY<STRING>`, which likely isn't desirable behavior.
Right now `ARRAY_TO_MV` exists purely as an expression virtual column, as a
follow-up I will be adding a dedicated virtual column for both `ARRAY_TO_MV`
and `MV_TO_ARRAY` so that coercing in either direction can be nearly as
performant as the native column usage, including supporting indexes for fast
filtering. Making migration not come with a penalty which is very important
longer term.
Deferring release notes specifically for this in favor of this being part of
a larger section on mvd -> array migration in 27 release.
This PR has:
- [ ] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]