[jira] [Created] (FLINK-24147) HDFS lease issues on Flink retry

Matthias (Jira) Fri, 03 Sep 2021 02:37:05 -0700

Matthias created FLINK-24147:
--------------------------------

             Summary: HDFS lease issues on Flink retry
                 Key: FLINK-24147
                 URL: https://issues.apache.org/jira/browse/FLINK-24147
             Project: Flink
          Issue Type: Bug
          Components: Connectors / Hadoop Compatibility
    Affects Versions: 1.13.2, 1.12.5, 1.14.0
            Reporter: Matthias



This issue was brought up on the [ML thread "hdfs lease issues on flink 
retry"|https://lists.apache.org/x/thread.html/r9e5dc9cbd0a41b88565bd6c8c1c9d864ffdd343b4a96bd4dd0dd8a97@%3Cuser.flink.apache.org%3E].
 See attached jobmanager.log which was provided by the user.

The user ran into {{FileAlreadyExistsException}} when it tried to create a file 
for which a lease already existed. [~dmvk] helped investigating this.

The problem seems to be that we use a fixed retry id {{0}} in 
[HadoopOutputFormatBase:137|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-connectors/flink-hadoop-compatibility/src/main/java/org/apache/flink/api/java/hadoop/mapred/HadoopOutputFormatBase.java#L137].

Each resource in HDFS is allowed to have only one Writer accessing it. The 
LeaseManager manages this through leases. It appears that we tried to access 
the same file through another task due to {{HadoopOutputFormatBase}} generating 
the same {{TaskAttemptId}}. The retry interval was shorter (in that case 10 
seconds) than Hadoop's hard-coded soft lease limit of 1min (see 
[hadoop:HdfsConstants:62|https://github.com/apache/hadoop/blob/a9c1489e31e8f602de62bd3ecc517aa6597ab2f8/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java#L62]).

We could be able to overcome this by adding a dynamic retry count instead of 
{{_0}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-24147) HDFS lease issues on Flink retry

Reply via email to