[jira] [Commented] (FLINK-29757) ContinuousFileSplitEnumerator skip unprocessed splits when the file is splittable

Hanley Yang (Jira) Tue, 25 Oct 2022 20:00:31 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-29757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624152#comment-17624152
 ]


Hanley Yang commented on FLINK-29757:
-------------------------------------

[~martijnvisser] [~luoyuxia]  I think we should use a combined string of path, 
offset and length instead of just path to track processed splits. I created a 
pull request for this, could you have a look at it? Thanks.

> ContinuousFileSplitEnumerator skip unprocessed splits when the file is 
> splittable
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-29757
>                 URL: https://issues.apache.org/jira/browse/FLINK-29757
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem
>            Reporter: Hanley Yang
>            Priority: Critical
>              Labels: pull-request-available
>
> ContinuousFileSplitEnumerator use a HashSet<Path> to store processed splits. 
> This works fine when process a file as a single split, once the file is 
> splittable it will make unprocessed splits skipped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-29757) ContinuousFileSplitEnumerator skip unprocessed splits when the file is splittable

Reply via email to