Jose Gonzalez created SPARK-43333:
-------------------------------------
Summary: Name union type members after types
Key: SPARK-43333
URL: https://issues.apache.org/jira/browse/SPARK-43333
Project: Spark
Issue Type: New Feature
Components: Structured Streaming
Affects Versions: 3.3.2
Reporter: Jose Gonzalez
Spark converts Avro union types into record types, where each member of the
union type corresponds to a field in the record type. The current behaviour is
to name the record fields "member0", "member1", etc, for each member of the
union type. We propose having the option to instead use the member type name.
The purpose of this is twofold:
# To allow adding or removing types to the union without affecting the record
names of other member types. If the new or removed type is not ordered last,
then existing queries referencing "member2" may need to be rewritten to
reference "member1" or "member3".
# Referencing the type name in the query is more readable than referencing
"member0".
For example, our system produces an avro schema from a Java type structure
where subtyping maps to union types whose members are ordered
lexicographically. Adding a subtype can therefore easily result in all
references to "member2" needing to be updated to "member3".
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]