Jose Gonzalez created SPARK-43333:
-------------------------------------

             Summary: Name union type members after types
                 Key: SPARK-43333
                 URL: https://issues.apache.org/jira/browse/SPARK-43333
             Project: Spark
          Issue Type: New Feature
          Components: Structured Streaming
    Affects Versions: 3.3.2
            Reporter: Jose Gonzalez


Spark converts Avro union types into record types, where each member of the 
union type corresponds to a field in the record type. The current behaviour is 
to name the record fields "member0", "member1", etc, for each member of the 
union type. We propose having the option to instead use the member type name.

The purpose of this is twofold:
 # To allow adding or removing types to the union without affecting the record 
names of other member types. If the new or removed type is not ordered last, 
then existing queries referencing "member2" may need to be rewritten to 
reference "member1" or "member3".
 # Referencing the type name in the query is more readable than referencing 
"member0".

For example, our system produces an avro schema from a Java type structure 
where subtyping maps to union types whose members are ordered 
lexicographically. Adding a subtype can therefore easily result in all 
references to "member2" needing to be updated to "member3".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to