[
https://issues.apache.org/jira/browse/HDFS-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405043#comment-13405043
]
Uma Maheswara Rao G commented on HDFS-3584:
-------------------------------------------
Thanks Brahma and Amith for digging into it.
Seems like a bug.
We are triggering the recovery from append on leaseExpired check, that means
that we are trusting that, older client might have gone down. So, there is no
renewal from clients and soft limit expired. And append call is triggering the
recovery himself and throwing the exception to user, saying file not yet closed
try again later. Here we are renewing the lease now from append call itself.
{code}
if (lease.expiredSoftLimit()) {
LOG.info("startFile: recover lease " + lease + ", src=" + src +
" from client " + pendingFile.getClientName());
boolean isClosed = internalReleaseLease(lease, src, null,
lease.expiredSoftLimit());
if(!isClosed)
throw new RecoveryInProgressException(
"Failed to close file " + src +
". Lease recovery is in progress. Try again later.");
}
{code}
and in internalReleaseLease:
{code}
case UNDER_RECOVERY:
final BlockInfoUnderConstruction uc =
(BlockInfoUnderConstruction)lastBlock;
// setup the last block locations from the blockManager if not known
if (uc.getNumExpectedLocations() == 0) {
uc.setExpectedLocations(blockManager.getNodes(lastBlock));
}
// start recovery of the last block for this file
long blockRecoveryId = nextGenerationStamp();
lease = reassignLease(lease, src, recoveryLeaseHolder, pendingFile);
uc.initializeBlockRecovery(blockRecoveryId);
leaseManager.renewLease(lease);
{code}
Here block recovery will happen in background in primary DN and will be
returned.
But unfortunately now close call came from the old client and file got closed.
Seems like this happend under high load.
But block ids already bumped in DNs and will rejected as file closed with older
genstamps at NN side.
commitBlockSynchronization also failing due to this reason.
I think we need to block the older clients to close the file at this stage?
what if append call takes the new lease ownership and removes the older client
lease?
close call anyway checking the lease expiration.
{code}
try {
pendingFile = checkLease(src, holder);
} catch (LeaseExpiredException lee) {
INodeFile file = dir.getFileINode(src);
if (file != null && !file.isUnderConstruction()) {
// This could be a retry RPC - i.e the client tried to close
// the file, but missed the RPC response. Thus, it is trying
// again to close the file. If the file still exists and
// the client's view of the last block matches the actual
// last block, then we'll treat it as a successful close.
// See HDFS-3031.
Block realLastBlock = file.getLastBlock();
if (Block.matchingIdAndGenStamp(last, realLastBlock)) {
NameNode.stateChangeLog.info("DIR* NameSystem.completeFile: " +
"received request from " + holder + " to complete file " + src +
" which is already closed. But, it appears to be an RPC " +
"retry. Returning success.");
return true;
}
}
throw lee;
}
{code}
I am not sure , I am missing some thing here.
would greatly appreciate your suggestions on this.
> Blocks are getting marked as corrupt with append operation under high load.
> ---------------------------------------------------------------------------
>
> Key: HDFS-3584
> URL: https://issues.apache.org/jira/browse/HDFS-3584
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 2.0.1-alpha
> Reporter: Brahma Reddy Battula
>
> Scenario:
> =========
> 1. There are 2 clients cli1 and cli2 cli1 write a file F1 and not closed
> 2. The cli2 will call append on unclosed file and triggers a leaserecovery
> 3. Cli1 is closed
> 4. Lease recovery is completed and with updated GS in DN and got BlockReport
> since there is a mismatch in GS the block got corrupted
> 5. Now we got a CommitBlockSync this will also fail since the File is already
> closed by cli1 and state in NN is Finalized
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira