[ 
https://issues.apache.org/jira/browse/HDFS-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867747#action_12867747
 ] 

Konstantin Shvachko commented on HDFS-1142:
-------------------------------------------

Sorry, took me a while.
The idea with lease recovery after soft limit expiration was that it is done 
under the same lease holder. Here is why.
Expiration of the soft limit means that somebody else can claim the lease, and 
if he succeeds, then he is the new owner, if not, then not.
So here several clients may compete for the same lease. They will call 
{{create()}} and get {{RecoveryInProgressException}} in response, which 
indicates that they should retry. The old client if still there can also 
compete for the lease. It has an advantage over other clients, because it does 
not need to go through the recovery process, but that seems fair.
If you reassign the lease to {{HDFS_NameNode}}, then its timeouts will reset, 
see {{reassignLease()}}. And this will change the behavior. The clients trying 
to claim the file will be getting {{AlreadyBeingCreatedException}}, which means 
they cannot compete for the file anymore, and should fail.
Suppose there is only one new client, and the old owner had died already. The 
client tries {{create()}}. This triggers lease recovery on NN, which starts the 
recovery under {{HDFS_NameNode}}, and throws {{RecoveryInProgressException}} 
back to the client. The client retries as expected, and the next time gets 
{{AlreadyBeingCreatedException}}. Thinking that somebody else got lucky before 
him the client bails out, which is not right as there is nobody esle competing 
for the file. 
Does that makes sense? I don't see a problem here. Do you have failing tests 
because of that?
That by the way explains the parameter {{internalReleaseLease()}}

- Introduction of {{NN_LEASE_RECOVERY_HOLDER}} constant definitely makes sense.
- Persisting leases is not an issue if we do not reassign.
- For future reference it is very undesirable to declare public methods in 
{{FSNamesystem}} to provide access to them from tests. The tests should either 
be in the right package or alternatively the {{FSNamesystem}} methods should be 
access via {{NameNodeAdapter}}, that's why it was introduced in the first 
place, see HDFS-563.


> Lease recovery doesn't reassign lease when triggered by append()
> ----------------------------------------------------------------
>
>                 Key: HDFS-1142
>                 URL: https://issues.apache.org/jira/browse/HDFS-1142
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>         Attachments: hdfs-1142.txt, hdfs-1142.txt
>
>
> If a soft lease has expired and another writer calls append(), it triggers 
> lease recovery but doesn't reassign the lease to a new owner. Therefore, the 
> old writer can continue to allocate new blocks, try to steal back the lease, 
> etc. This is for the testRecoveryOnBlockBoundary case of HDFS-1139

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to