Raymond Xu created HUDI-1723:
--------------------------------

             Summary: DFSPathSelector skips files with the same modify date 
when read up to source limit
                 Key: HUDI-1723
                 URL: https://issues.apache.org/jira/browse/HUDI-1723
             Project: Apache Hudi
          Issue Type: Bug
          Components: DeltaStreamer
            Reporter: Raymond Xu
             Fix For: 0.9.0
         Attachments: Screen Shot 2021-03-26 at 1.42.42 AM.png

org.apache.hudi.utilities.sources.helpers.DFSPathSelector#listEligibleFiles 
filters the input files based on last saved checkpoint, which was the 
modification date from last read file. However, the last read file's 
modification date could be duplicated for multiple files and resulted in 
skipping a few of them when reading up to source limit. An illustration is 
shown in the attached picture.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to