[
https://issues.apache.org/jira/browse/HUDI-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342178#comment-17342178
]
Vinoth Chandar commented on HUDI-1723:
--------------------------------------
yes. [~xushiyan] can we file an umbrella issue and file one for s3 and one for
gcs
[https://cloud.google.com/storage/docs/object-change-notification]
> DFSPathSelector skips files with the same modify date when read up to source
> limit
> ----------------------------------------------------------------------------------
>
> Key: HUDI-1723
> URL: https://issues.apache.org/jira/browse/HUDI-1723
> Project: Apache Hudi
> Issue Type: Bug
> Components: DeltaStreamer
> Reporter: Raymond Xu
> Assignee: Raymond Xu
> Priority: Blocker
> Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
> Attachments: Screen Shot 2021-03-26 at 1.42.42 AM.png
>
>
> org.apache.hudi.utilities.sources.helpers.DFSPathSelector#listEligibleFiles
> filters the input files based on last saved checkpoint, which was the
> modification date from last read file. However, the last read file's
> modification date could be duplicated for multiple files and resulted in
> skipping a few of them when reading up to source limit. An illustration is
> shown in the attached picture.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)