bvaradar commented on issue #2162: URL: https://github.com/apache/hudi/issues/2162#issuecomment-708261108
I think this is due to the way spark deduces avro schema from Transformer ROw (in a way similar to https://github.com/apache/hudi/issues/2149#issuecomment-707624922) You can try changing the SchemaProvider to do the below steps where the target schema is recreated using Spark-avro. This would make it consistent with Transformer generated DF. Schema newSchema = AvroConversionUtils.convertStructTypeToAvroSchema( + AvroConversionUtils.convertAvroSchemaToStructType(schema), RowBasedSchemaProvider.HOODIE_RECORD_STRUCT_NAME, + RowBasedSchemaProvider.HOODIE_RECORD_NAMESPACE); If you are using master branch, we have added support for plugging in SchemaPostProcessor (using config: hoodie.deltastreamer.schemaprovider.schema_post_processor=org.apache.hudi.utilities.schema.SparkAvroPostProcessor where you can implement the processSchema() method to do the above transformation. ``` package org.apache.hudi.utilities.schema; import org.apache.hudi.AvroConversionUtils; import org.apache.hudi.common.config.TypedProperties; import org.apache.avro.Schema; import org.apache.spark.api.java.JavaSparkContext; public class SparkAvroPostProcessor extends SchemaPostProcessor { protected SparkAvroPostProcessor(TypedProperties props, JavaSparkContext jssc) { super(props, jssc); } @Override public Schema processSchema(Schema schema) { return AvroConversionUtils.convertStructTypeToAvroSchema( AvroConversionUtils.convertAvroSchemaToStructType(schema), RowBasedSchemaProvider.HOODIE_RECORD_STRUCT_NAME, RowBasedSchemaProvider.HOODIE_RECORD_NAMESPACE); } }``` If this works, I will open a PR (Jira: https://issues.apache.org/jira/browse/HUDI-1343) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
