Daryn Sharp created HDFS-13465:
----------------------------------
Summary: Overlapping lease recoveries cause NPE in NN
Key: HDFS-13465
URL: https://issues.apache.org/jira/browse/HDFS-13465
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Overlapping lease recoveries for the same file will NPE in the DatanodeManager
while creating LeaseRecoveryCommands, possibly losing other recovery commands.
* client1 calls recoverLease, file is added to DN1's recovery queue
* client2 calls recoverLease, file is added to DN2's recovery queue
* one DN heartbeats, gets the block recovery command and it completes the
synchronization before the other DN heartbeats; ie. file is closed.
* other DN heartbeats, takes block from recovery queue, assumes it's still
under construction, gets a NPE calling getExpectedLocations
{code:java}
//check lease recovery
BlockInfo[] blocks = nodeinfo.getLeaseRecoveryCommand(Integer.MAX_VALUE);
if (blocks != null) {
BlockRecoveryCommand brCommand = new BlockRecoveryCommand(
blocks.length);
for (BlockInfo b : blocks) {
BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
assert uc != null;
final DatanodeStorageInfo[] storages = uc.getExpectedStorageLocations();
{code}
This is "ok" to the NN state if only 1 block was queued. All recoveries are
lost if multiple blocks were queued. Recovery will not occur until the client
explicitly retries or the lease monitor recovers the lease.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]