[jira] [Commented] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

Stephen O'Donnell (Jira) Mon, 06 Jul 2020 10:40:07 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152206#comment-17152206
 ]


Stephen O'Donnell commented on HDFS-14498:
------------------------------------------

{quote}
IMO, if client never renew lease for while, we should remove the last block 
directly and set the file length to 0 if this file include only one block or 
remove the last block and reset file length. any thought?
{quote}

I was thinking of something similar. If the client is reporting zero bytes, and 
that is what is committed to the NN, then the block is going to be useless 
anyway. We should probably handle it via recover lease or the automated lease 
recovery after 1 hour in the namenode. I am not sure if there is some other 
edge case we may be missing with this approach. At the moment I cannot think of 
any reason to keep a zero byte block in the "committed but not complete" state.

> LeaseManager can loop forever on the file for which create has failed 
> ----------------------------------------------------------------------
>
>                 Key: HDFS-14498
>                 URL: https://issues.apache.org/jira/browse/HDFS-14498
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.9.0
>            Reporter: Sergey Shelukhin
>            Priority: Major
>
> The logs from file creation are long gone due to infinite lease logging, 
> however it presumably failed... the client who was trying to write this file 
> is definitely long dead.
> The version includes HDFS-4882.
> We get this log pattern repeating infinitely:
> {noformat}
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard 
> limit
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=<snip>
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
> Failed to release lease for file <snip>. Committed blocks are waiting to be 
> minimally replicated. Try again later.
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
> <snip> in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
> pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file <snip>. 
> Committed blocks are waiting to be minimally replicated. Try again later.
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
>       at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
>       at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
>       at java.lang.Thread.run(Thread.java:745)
> $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
> 1" hdfs_nn*
> hdfs_nn.log:1068035
> hdfs_nn.log.2019-05-16-14:1516179
> hdfs_nn.log.2019-05-16-15:1538350
> {noformat}
> Aside from an actual bug fix, it might make sense to make LeaseManager not 
> log so much, in case if there are more bugs like this...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

Reply via email to