garyli1019 commented on a change in pull request #1938:
URL: https://github.com/apache/hudi/pull/1938#discussion_r554287262
##########
File path:
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieInputFormatUtils.java
##########
@@ -470,4 +471,45 @@ private static HoodieBaseFile
refreshFileStatus(Configuration conf, HoodieBaseFi
}
}
+ /**
+ * List affected file status based on given commits.
+ * @param basePath
+ * @param commitsToCheck
+ * @param timeline
+ * @return HashMap<partitionPath, HashMap<fileName, FileStatus>>
+ * @throws IOException
+ */
+ public static HashMap<String, HashMap<String, FileStatus>>
listStatusForAffectedPartitions(
+ Path basePath, List<HoodieInstant> commitsToCheck, HoodieTimeline
timeline) throws IOException {
+ // Extract files touched by these commits.
+ // TODO This might need to be done in parallel like listStatus parallelism
?
Review comment:
Are you referring to RFC-15 that not being landed yet? The current
implementation of `HoodieParquetInputFormat` is listing all files of affected
partitions and then do the filtering later.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]