clintropolis commented on a change in pull request #10505:
URL: https://github.com/apache/druid/pull/10505#discussion_r648836460
##########
File path: docs/development/extensions-core/avro.md
##########
@@ -39,13 +39,24 @@ Make sure to
[include](../../development/extensions.md#loading-extensions) `drui
Druid supports most Avro types natively, there are however some exceptions
which are detailed here.
-`union` types which aren't of the form `[null, otherType]` aren't supported at
this time.
+#### Unions
+Druid has two modes for supporting `union` types. The original legacy mode can
only support unions of the form `[null, otherType]`.
+The newer mode can be enabled by setting `extractUnions` on the Avro parser in
which case unions will be expanded according to the following rules:
Review comment:
Hmm, I looked a lot closer at this than I did on a previous pass, and I
think the old docs were sort of wrong. Using the example `someMultiMemberUnion`
type in this PR, I can still use a flatten spec to extract the values of any
type with the existing code, apparently even including for record types, where
the extraction path is `$.someMultiMemberUnion.subString` (instead of
`$.someMultiMemberUnion.UnionSubRecord.subString` as in the mode added in this
PR).
As such, with my better understanding I think it makes sense to instead call
this new property `extractUnionsByType` or something similar, and clarify that
this new mode _requires_ using a flatten spec to extract the values, but with
the benefit that you can selectively extract values of only a certain type so
that they can be mapped to separate Druid columns or whatever. I also don't
think it necessarily makes sense to refer to the other mode as legacy, since I
guess it still has a use if the union is composed mainly of primitive types and
all are able to be coerced into a common Druid type, or if it is a simple union
type of the legacy form, which the new mode does not effect (since the isUnion
code checks for more than 1 non-null type).
Sorry I didn't look closer into this previously and for the review churn, my
bad.
Also because the new mode needs to be link to the flatten spec docs, maybe
it makes sense to just move the unions description entirely into the complex
types section, which also seems to mirror the Avro specification docs
https://avro.apache.org/docs/current/spec.html#schema_complex
##########
File path: docs/development/extensions-core/avro.md
##########
@@ -39,13 +39,24 @@ Make sure to
[include](../../development/extensions.md#loading-extensions) `drui
Druid supports most Avro types natively, there are however some exceptions
which are detailed here.
-`union` types which aren't of the form `[null, otherType]` aren't supported at
this time.
+#### Unions
+Druid has two modes for supporting `union` types. The original legacy mode can
only support unions of the form `[null, otherType]`.
+The newer mode can be enabled by setting `extractUnions` on the Avro parser in
which case unions will be expanded according to the following rules:
+* Primitive types and unnamed complex types are keyed their type name. i.e
`int`, `string`
+* Complex named types are keyed by their names, this includes `record`,
`fixed` and `enum`.
+* The Avro null type is elided as it's value can only ever be null
Review comment:
super nit: i think it's -> its
##########
File path:
extensions-core/avro-extensions/src/main/java/org/apache/druid/data/input/avro/GenericAvroJsonProvider.java
##########
@@ -195,4 +210,46 @@ public Object unwrap(final Object o)
{
return o;
}
+
+ private boolean isExplodableUnion(final Schema.Field field)
Review comment:
nit: still using explode terminology
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]