[
https://issues.apache.org/jira/browse/FLINK-23725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402650#comment-17402650
]
Paul Lin commented on FLINK-23725:
----------------------------------
I've also met this issue. If the file name already exists, FileCommiter would
silently skip the commit, which may lead to data loss.
The root cause is that #rename would not throw exceptions if the target file
already exists or the src file doesn't exist, instead it returns false to
indicate the operation is failed, as [Hadoop
ClientProtocal]([https://github.com/apache/hadoop/blob/b6d19718204af02da6e2ed0b83d5936824371fc0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java#L520)]
mentioned.
I think in both cases we should throw an exception.
> HadoopFsCommitter, file rename failure
> --------------------------------------
>
> Key: FLINK-23725
> URL: https://issues.apache.org/jira/browse/FLINK-23725
> Project: Flink
> Issue Type: Bug
> Components: Connectors / FileSystem, Connectors / Hadoop
> Compatibility, FileSystems
> Affects Versions: 1.11.1, 1.12.1
> Reporter: todd
> Priority: Major
>
> When the HDFS file is written, if the part file exists, only false will be
> returned if the duplicate name fails.Whether to throw an exception that
> already exists in the part, or print related logs.
>
> ```
> org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.HadoopFsCommitter#commit
> public void commit() throws IOException {
> final Path src = recoverable.tempFile();
> final Path dest = recoverable.targetFile();
> final long expectedLength = recoverable.offset();
> try {
> //always return false or ture
> fs.rename(src, dest);
> } catch (IOException e) {
> throw new IOException(
> "Committing file by rename failed: " + src + " to " + dest, e);
> }
> }
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)