gsanon opened a new issue, #12778: URL: https://github.com/apache/hudi/issues/12778
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** Currently this is not possible to write Hudi data if the source DF contains a field with union type. Let's say you have an avro schema with an union type field : ```json { "name": "field", "type": [ "null", "int", "string", "boolean" ], "default": null } ``` If you create a parquet file from this the field will be transformed into a struct with a memberX for each type of the union `field<struct<member0:int, member1:string, member2:boolean>>` Then when writing data in Hudi, in 0.X, the process fails because it will take only the first type and then try to write the struct into the type selected, in our case you will get something like : `java.lang.IllegalArgumentException: StructType(StructField(member0,IntegerType,true),StructField(member1,StringType,true),StructField(member2,BooleanType,true)) and IntegerType are incompatible:` This behavior was a bit [opaque](https://github.com/apache/hudi/blob/release-0.14.1/hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/AvroInternalSchemaConverter.java#L199-L204) in 0.x but in 1.0.0 this has been made pretty clear [here ](https://github.com/apache/hudi/blob/14c292c626cd8d18b5997a90cfbb865befb5f6d2/hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java#L436-L439) So each time we encounter parquet files containing fields with union type we need to pre-process the data as a inelegant workaround (renaming the `memberX` fields to avoid the union type detection) Knowing that the parquet implementation allows this union type and Avro as well, we could expect Hudi to be able to handle it in one or other way (representing the member struct as it is ?). Wdyt ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
