[
https://issues.apache.org/jira/browse/FLINK-29757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624152#comment-17624152
]
Hanley Yang commented on FLINK-29757:
-------------------------------------
[~martijnvisser] [~luoyuxia] I think we should use a combined string of path,
offset and length instead of just path to track processed splits. I created a
pull request for this, could you have a look at it? Thanks.
> ContinuousFileSplitEnumerator skip unprocessed splits when the file is
> splittable
> ---------------------------------------------------------------------------------
>
> Key: FLINK-29757
> URL: https://issues.apache.org/jira/browse/FLINK-29757
> Project: Flink
> Issue Type: Bug
> Components: Connectors / FileSystem
> Reporter: Hanley Yang
> Priority: Critical
> Labels: pull-request-available
>
> ContinuousFileSplitEnumerator use a HashSet<Path> to store processed splits.
> This works fine when process a file as a single split, once the file is
> splittable it will make unprocessed splits skipped.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)