[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4667: [HUDI-3276][Stacked on 4559] Rebased Parquet-based `FileInputFormat` impls to inherit from `MapredParquetInputFormat`

GitBox Mon, 07 Feb 2022 09:51:57 -0800


alexeykudinkin commented on a change in pull request #4667:
URL: https://github.com/apache/hudi/pull/4667#discussion_r800912024




##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieCopyOnWriteTableInputFormat.java
##########
@@ -65,12 +71,32 @@
  *   <li>Incremental mode: reading table's state as of particular timestamp 
(or instant, in Hudi's terms)</li>
  *   <li>External mode: reading non-Hudi partitions</li>
  * </ul>
+ *
+ * NOTE: This class is invariant of the underlying file-format of the files 
being read
  */
-public abstract class HoodieFileInputFormatBase extends 
FileInputFormat<NullWritable, ArrayWritable>
+public class HoodieCopyOnWriteTableInputFormat extends 
FileInputFormat<NullWritable, ArrayWritable>

Review comment:
       Can you elaborate what you see confusing in there? We already have such 
splitting in Spark for ex (`MergeOnReadSnapshotRelation`, etc)
   
   I actually think it's much more cleaner connecting w/ MOR/COW dichotomy 
rather previous one of Realtime/non-Realtime




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4667: [HUDI-3276][Stacked on 4559] Rebased Parquet-based `FileInputFormat` impls to inherit from `MapredParquetInputFormat`

Reply via email to