[
https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218085#comment-14218085
]
Yongjun Zhang commented on HDFS-7342:
-------------------------------------
HI Ravi,
Thanks a lot for your continued effort to work on this!
I had some concern of changing {{assertAllBlocksComplete()}} to
{{assertAllBlocksCompleteOrCommitted()}}, because
{{finalizeINodeFileUnderConstruction}} checks to make sure all blocks are
complete (which means all blocks need to have minimal replication) before
closing a file. Your suggested change would change the behavior here, and I
think it may not be safe.
Good thing is, the place that we are fixing does check the minimal replication
requirement before calling {{finalizeINodeFileUnderConstruction}}. So I was
looking into addressing this issue in a different approach on top of my earlier
suggested fix at
https://issues.apache.org/jira/browse/HDFS-4882?focusedCommentId=14215703&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14215703.
Here is the complete snapshot of my approach:
{code}
// If penultimate block doesn't exist then its minReplication is met
boolean penultimateBlockMinReplication = penultimateBlock == null ? true :
blockManager.checkMinReplication(penultimateBlock);
BlockUCState penultimateBlockState = penultimateBlock == null ?
BlockUCState.COMPLETE : penultimateBlock.getBlockUCState();
String blockStateStr = "(penultimateBlockState=" + penultimateBlockState +
" lastBlockState="+lastBlockState + ")";
switch(lastBlockState) {
case COMPLETE:
// Getting here means the penultimate block is COMMITTED, fallthrough
// to handle the same way as when the final block is COMMITTED
case COMMITTED:
blockStateStr = "(penultimateBlockState=" + penultimateBlockState +
" lastBlockState="+lastBlockState + ")";
// Close file if committed blocks are minimally replicated
if(penultimateBlockMinReplication &&
blockManager.checkMinReplication(lastBlock)) {
if (penultimateBlock != null &&
penultimateBlockState == BlockUCState.COMMITTED) {
getBlockManager().forceCompleteBlock(pendingFile,
(BlockInfoUnderConstruction)penultimateBlock);
}
getBlockManager().forceCompleteBlock(pendingFile,
(BlockInfoUnderConstruction) lastBlock);
finalizeINodeFileUnderConstruction(src, pendingFile,
iip.getLatestSnapshotId());
NameNode.stateChangeLog.warn("BLOCK*"
+ " internalReleaseLease: Committed blocks are minimally replicated,"
+ " lease removed, file closed " + blockStateStr + ".");
return true; // closed!
}
// Cannot close file right now, since some blocks
// are not yet minimally replicated.
// This may potentially cause infinite loop in lease recovery
// if there are no valid replicas on data-nodes.
String message = "DIR* NameSystem.internalReleaseLease: " +
"Failed to release lease for file " + src +
". Committed blocks are waiting to be minimally replicated " +
blockStateStr + ". Try again later.";
NameNode.stateChangeLog.warn(message);
throw new AlreadyBeingCreatedException(message);
case UNDER_CONSTRUCTION:
{code}
That is, add the {{getBlockManager().forceCompleteBlock}} for the neede blocks
before calling
{code}
finalizeINodeFileUnderConstruction(src, pendingFile,
iip.getLatestSnapshotId());
{code}
However, there is a problem because of the way your test is written, the
{{BlockInfoUCDesired}} in your test is not allowed to be converted to
{{(BlockInfoUnderConstruction}}. I was looking a bit more into fixing that.
Make sense to you? If you agree, maybe you can look further into this direction?
BTW, notice I had the block state added to the messsage printing, which I think
is important. Also I added comments in between the two {{case}} statements.
Thanks again!
> Lease Recovery doesn't happen some times
> ----------------------------------------
>
> Key: HDFS-7342
> URL: https://issues.apache.org/jira/browse/HDFS-7342
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.0.0-alpha
> Reporter: Ravi Prakash
> Assignee: Ravi Prakash
> Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch
>
>
> In some cases, LeaseManager tries to recover a lease, but is not able to.
> HDFS-4882 describes a possibility of that. We should fix this
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)