[
https://issues.apache.org/jira/browse/HDFS-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042578#comment-13042578
]
Todd Lipcon commented on HDFS-1149:
-----------------------------------
A few nits:
- for DataNode.setHeartbeatsEnabled, I think it would be better to make it
package-private, and then bounce through the "DataNodeAdapter" class to get at
it. I also think it would be clearer if we inverted its meaning and renamed it
to {{heartbeatsDisabledForTests}} - that way when reading the code later it
will be clear that this is always false in normal operation.
- Same goes for all of the new public members in LeaseManager/Lease -- I think
you can just move the getLeaseByPath function into NameNodeAdapter, then it can
all stay package-protected, right?
- In the test case, I think it's better to call {{stm.hflush()}} after the
writer has lost its lease -- this is a DN-only operation, which means that it's
verifying that the lease recovery has gone all the way through, not just a NN
state change. The fact that you check isUnderConstruction should already do
that as well, but just a double-check. Then you can close the stream as well
and check for the same exception.
- I think the new NAMENODE_LEASE_MANAGER_SLEEP_TIME is probably better named
NAMENODE_LEASE_RECHECK_INTERVAL (more consistent with other variables like
{{heartbeatRecheckInterval}} and {{replicationRecheckInterval}})
Other concern:
- Does this interact correctly with lease maintenance on rename/delete? I think
so... but it would be good to add the following tests:
Test A:
1) client creates file /dir_a/file and leaves it open
2) client renames /dir_a to /dir_b (this calls LeaseManager.changeLease)
3) client dies, so lease recovery happens
4) NN reassigns lease to NN_Recovery
5) NN restarts and loads edits: NN_Recovery should own the lease on the new
location of the file
[ this tests that on edit log replay, the lease is properly tracked to the new
name of the file ]
Test B:
1) client creates file /file and leaves it open
2) client deletes file /file
3) client dies, so lease recovery happens
4) NN reassigns lease to NN_Recovery
5) NN restarts and loads edits: no NPEs or anything
I'm also wondering if we have an issue with regards to safeMode. In theory we
should never write anything to the edit log while in safemode, but I don't see
safemode checks in internalReleaseLease. This is similar to the bugs seen in
HDFS-988 if you want some background
> Lease reassignment is not persisted to edit log
> -----------------------------------------------
>
> Key: HDFS-1149
> URL: https://issues.apache.org/jira/browse/HDFS-1149
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.21.0, 0.22.0, 0.23.0
> Reporter: Todd Lipcon
> Assignee: Aaron T. Myers
> Fix For: 0.23.0
>
> Attachments: hdfs-1149.0.patch
>
>
> During lease recovery, the lease gets reassigned to a special NN holder. This
> is not currently persisted to the edit log, which means that after an NN
> restart, the original leaseholder could end up allocating more blocks or
> completing a file that has already started recovery.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira