bvaradar commented on a change in pull request #2012:
URL: https://github.com/apache/hudi/pull/2012#discussion_r516082122
##########
File path: hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala
##########
@@ -364,4 +366,40 @@ object AvroConversionHelper {
}
}
}
+
+ /**
+ * Remove namespace from fixed field.
+ * org.apache.spark.sql.avro.SchemaConverters.toAvroType method adds
namespace to fixed avro field
+ *
https://github.com/apache/spark/blob/master/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L177
+ * So, we need to remove that namespace so that reader schema without
namespace do not throw erorr like this one
+ * org.apache.avro.AvroTypeException: Found
hoodie.source.hoodie_source.height.fixed, expecting fixed
+ *
+ * @param schema Schema from which namespace needs to be removed for fixed
fields
+ * @return input schema with namespace removed for fixed fields, if any
+ */
+ def removeNamespaceFromFixedFields(schema: Schema): Schema ={
Review comment:
@n3nash : This might require holistic look at how schema evolution is
handled.
As a last option before I let @n3nash decide on how to best take in this
change, @sathyaprakashg : Since this is not a backwards compatible change in
the true sense (underlying type is same), Can you try adding a additional
where, we do a variant of
https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java#L73
In HoodieAvroDataBlock:
1. Use genericReader with only old schema. This will avoid schema evolution
handling.
2. Create a genericWriter and writes the record back to bytes but written
with the new (updated) schema
3. then use genericReader (like 1) to read but use the updated schema
Can you see if this works around the issue ? If it does, then this needs to
be a configuration controlled feature when reading records from log records.
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/model/BaseAvroPayload.java
##########
@@ -39,13 +40,19 @@
*/
protected final Comparable orderingVal;
+ /**
+ * Schema used to convert avro to bytes.
+ */
+ protected final Schema writerSchema;
Review comment:
You can introduce another base class BaseAvroPayloadWithSchema which
extends from BaseAvroPayload and stores the schema. This will be the base class
for any new implementation which needs to store schema as part of pyload
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]