bvaradar commented on a change in pull request #2012:
URL: https://github.com/apache/hudi/pull/2012#discussion_r513742129



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala
##########
@@ -364,4 +366,40 @@ object AvroConversionHelper {
         }
     }
   }
+
+  /**
+   * Remove namespace from fixed field.
+   * org.apache.spark.sql.avro.SchemaConverters.toAvroType method adds 
namespace to fixed avro field
+   * 
https://github.com/apache/spark/blob/master/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L177
+   * So, we need to remove that namespace so that reader schema without 
namespace do not throw erorr like this one
+   * org.apache.avro.AvroTypeException: Found 
hoodie.source.hoodie_source.height.fixed, expecting fixed
+   *
+   * @param schema Schema from which namespace needs to be removed for fixed 
fields
+   * @return input schema with namespace removed for fixed fields, if any
+   */
+  def removeNamespaceFromFixedFields(schema: Schema): Schema  ={

Review comment:
       @sathyaprakashg : Thanks for the detailed write up. Sorry for missing 
this part. Regarding the 3rd flow (transformation without user provided 
schema),  the exception indicate 
https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java#L157
 but per your observation, HoodieAvroUtils.bytesToAvro() works without issue. 
Can you see if you can use HoodieAvroUtils.bytesToAvro() in 
HoodieAvroDataBlock. Does this solve the issue w.r.t schema evolution handling ?

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/BaseAvroPayload.java
##########
@@ -39,13 +40,19 @@
    */
   protected final Comparable orderingVal;
 
+  /**
+   * Schema used to convert avro to bytes.
+   */
+  protected final Schema writerSchema;

Review comment:
       This would increase the memory footprint (and increase I/O in shuffle 
stages if we introduce schema at payload level. 
   You may want to introduce another class (similar to BaseAvroPayload) but 
also tracking schema at record level cc @n3nash
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to