rahil-c commented on code in PR #5786:
URL: https://github.com/apache/hudi/pull/5786#discussion_r918250071


##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeRecordReaderUtils.java:
##########
@@ -189,7 +190,13 @@ public static Writable avroToArrayWritable(Object value, 
Schema schema) {
         Writable[] recordValues = new Writable[schema.getFields().size()];
         int recordValueIndex = 0;
         for (Schema.Field field : schema.getFields()) {
-          recordValues[recordValueIndex++] = 
avroToArrayWritable(record.get(field.name()), field.schema());
+          Object fieldValue = null;
+          try {
+            fieldValue = record.get(field.name());
+          } catch (AvroRuntimeException e) {
+            LOG.debug("Field:" + field.name() + "not found in Schema:" + 
schema.toString());

Review Comment:
   This change is actually an internal patch we have made from @zhedoubushishi 
   
   From my understanding we want to catch this exception as opposed to throw, 
the reason being that before in avro 1.8.2 if a field was not found it would 
continue forward, so in this code path we would not see issues. 
   However now we are targeting the avro `<avro.version>1.10.2</avro.version>`, 
in this avro version if a field is not found it will throw an exception which 
will break several tests, thus we can log this message for now and continue 
forward.
   
   A More detailed explanation from wenning's original change here
   ```
   This is because when converting Avro records to ArrayWritable, Hudi calls 
avroToArrayWritable(Object value, Schema schema) which would try to extract all 
the fields in the schema from Avro record. However, in some queries, not all 
the fields are required in the record, e.g. when enabling usesCustomPayload 
(which is enabled for bootstrap tables by default), only projection columns are 
loaded: 
https://github.com/apache/hudi/blob/release-0.10.1/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeCompactedRecordReader.java#L145-L148.
 This means some fields are not provided in the record.
   
   Before Avro 1.10, this is fine because if record.get(field.name() failed, it 
would just return null. However, after Avro 1.10, it would throw a realtime 
exception AvroRuntimeException and break the job. So in this CR, we would catch 
this runtime exception.
   ``` 
   
   To make sure this change didnt cause issues there is also this that was 
added. 
   ```
     @Test
     public void testAvroToArrayWritable() throws IOException {
       Schema schema = SchemaTestUtil.getEvolvedSchema();
       GenericRecord record = SchemaTestUtil.generateAvroRecordFromJson(schema, 
1, "100", "100", false);
       ArrayWritable aWritable = (ArrayWritable) 
HoodieRealtimeRecordReaderUtils.avroToArrayWritable(record, schema);
       assertEquals(schema.getFields().size(), aWritable.get().length);
   
       // In some queries, generic records that Hudi gets are just part of the 
full records.
       // Here test the case that some fields are missing in the record.
       Schema schemaWithMetaFields = HoodieAvroUtils.addMetadataFields(schema);
       ArrayWritable aWritable2 = (ArrayWritable) 
HoodieRealtimeRecordReaderUtils.avroToArrayWritable(record, 
schemaWithMetaFields);
       assertEquals(schemaWithMetaFields.getFields().size(), 
aWritable2.get().length);
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to