VitoMakarevich commented on code in PR #11450:
URL: https://github.com/apache/hudi/pull/11450#discussion_r1639440948


##########
hudi-hadoop-common/src/main/java/org/apache/parquet/avro/HoodieAvroReadSupport.java:
##########
@@ -51,6 +51,13 @@ public ReadContext init(Configuration configuration, 
Map<String, String> keyValu
       configuration.set(AvroWriteSupport.WRITE_OLD_LIST_STRUCTURE,
           "false", "support reading avro from non-legacy map/list in parquet 
file");
     }
+    // If old file is written with legacy mode, read with legacy mode, even if 
now it's non-legacy mode in conf.
+    // Since by changing the property users want to control how files are 
written, not how they are read.
+    // Later the value of WRITE_OLD_LIST_STRUCTURE will be picked from conf, 
thus either keeping old mode or writing new mode.
+    if (legacyMode) {
+      configuration.set(AvroWriteSupport.WRITE_OLD_LIST_STRUCTURE,

Review Comment:
   This is what exactly leads to reading `null` from list fields even though 
they contain a value.
   This leads to exception or silent data loss.
   Basically - the change is for reading, while this setting is for writing. 
The writing part still will be used.
   e.g. if I set WRITE_OLD_LIST_STRUCTURE to any value - I expect Hudi/Spark to 
be able to read 2 levels and 3 level while producing only 3 levels, but in the 
current state - Hudi reads nulls from 2 level parquets if the setting is set to 
false
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to