Raymond Xu created HUDI-1723:
--------------------------------
Summary: DFSPathSelector skips files with the same modify date
when read up to source limit
Key: HUDI-1723
URL: https://issues.apache.org/jira/browse/HUDI-1723
Project: Apache Hudi
Issue Type: Bug
Components: DeltaStreamer
Reporter: Raymond Xu
Fix For: 0.9.0
Attachments: Screen Shot 2021-03-26 at 1.42.42 AM.png
org.apache.hudi.utilities.sources.helpers.DFSPathSelector#listEligibleFiles
filters the input files based on last saved checkpoint, which was the
modification date from last read file. However, the last read file's
modification date could be duplicated for multiple files and resulted in
skipping a few of them when reading up to source limit. An illustration is
shown in the attached picture.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)