[
https://issues.apache.org/jira/browse/FLINK-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16395111#comment-16395111
]
ASF GitHub Bot commented on FLINK-8599:
---------------------------------------
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/5521
The approach here does not work (there is also no test, otherwise that
would have been caught).
Flink does not magically load configs like Hadoop, Flink passes configs
explicitly.
Aside from that, we first should actually decide whether this should be
> Improve the failure behavior of the FileInputFormat for bad files
> -----------------------------------------------------------------
>
> Key: FLINK-8599
> URL: https://issues.apache.org/jira/browse/FLINK-8599
> Project: Flink
> Issue Type: New Feature
> Components: DataStream API
> Affects Versions: 1.4.0, 1.3.2
> Reporter: Chengzhi Zhao
> Priority: Major
>
> So we have a s3 path that flink is monitoring that path to see new files
> available.
> {code:java}
> val avroInputStream_activity = env.readFile(format, path,
> FileProcessingMode.PROCESS_CONTINUOUSLY, 10000) {code}
>
> I am doing both internal and external check pointing and let's say there is a
> bad file (for example, a different schema been dropped in this folder) came
> to the path and flink will do several retries. I want to take those bad files
> and let the process continue. However, since the file path persist in the
> checkpoint, when I try to resume from external checkpoint, it threw the
> following error on no file been found.
>
> {code:java}
> java.io.IOException: Error opening the Input Split s3a://myfile [0,904]: No
> such file or directory: s3a://myfile{code}
>
> As [[email protected]] suggested, we could check if a path exists and before
> trying to read a file and ignore the input split instead of throwing an
> exception and causing a failure.
>
> Also, I am thinking about add an error output for bad files as an option to
> users. So if there is any bad files exist we could move them in a separated
> path and do further analysis.
>
> Not sure how people feel about it, but I'd like to contribute on it if people
> think this can be an improvement.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)