wangxianghu commented on a change in pull request #2226:
URL: https://github.com/apache/hudi/pull/2226#discussion_r518749721



##########
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/DFSPathSelector.java
##########
@@ -98,35 +99,37 @@ public static DFSPathSelector 
createSourceSelector(TypedProperties props,
       long sourceLimit) {
 
     try {
-      // obtain all eligible files under root folder.
-      log.info("Root path => " + props.getString(Config.ROOT_INPUT_PATH_PROP) 
+ " source limit => " + sourceLimit);
-      long lastCheckpointTime = 
lastCheckpointStr.map(Long::parseLong).orElse(Long.MIN_VALUE);
-      List<FileStatus> eligibleFiles = listEligibleFiles(fs, new 
Path(props.getString(Config.ROOT_INPUT_PATH_PROP)), lastCheckpointTime);
-      // sort them by modification time.
-      
eligibleFiles.sort(Comparator.comparingLong(FileStatus::getModificationTime));
-      // Filter based on checkpoint & input size, if needed
-      long currentBytes = 0;
+      String pathStr = props.getString(Config.ROOT_INPUT_PATH_PROP);
       long maxModificationTime = Long.MIN_VALUE;
-      List<FileStatus> filteredFiles = new ArrayList<>();
-      for (FileStatus f : eligibleFiles) {
-        if (currentBytes + f.getLen() >= sourceLimit) {

Review comment:
       IIUC, we can set 'sourceLimit' to Long.MAX_VALUE to read all of the data 
at one go. right?
   if I am right we can save lots of changes




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to