[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4667: [HUDI-3276] Rebased Parquet-based `FileInputFormat` impls to inherit from `MapredParquetInputFormat`

GitBox Tue, 15 Feb 2022 11:08:13 -0800


alexeykudinkin commented on a change in pull request #4667:
URL: https://github.com/apache/hudi/pull/4667#discussion_r807182343




##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieCopyOnWriteTableInputFormat.java
##########
@@ -65,12 +71,32 @@
  *   <li>Incremental mode: reading table's state as of particular timestamp 
(or instant, in Hudi's terms)</li>
  *   <li>External mode: reading non-Hudi partitions</li>
  * </ul>
+ *
+ * NOTE: This class is invariant of the underlying file-format of the files 
being read
  */
-public abstract class HoodieFileInputFormatBase extends 
FileInputFormat<NullWritable, ArrayWritable>
+public class HoodieCopyOnWriteTableInputFormat extends 
FileInputFormat<NullWritable, ArrayWritable>

Review comment:
       Yeah, i've crossed the same path recently realizing that this dichotomy 
doesn't line up well with the reading path, and i think the crux of the problem 
is that COW is purely write-side semantic therefore when we say COW on the 
read-path that doesn't really make sense. 
   
   I'm touching up sibling hierarchy on Spark side, and will think about better 
terminology there and afterwards we can carry it over here as well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4667: [HUDI-3276] Rebased Parquet-based `FileInputFormat` impls to inherit from `MapredParquetInputFormat`

Reply via email to