Yongjun Zhang created HDFS-13115:
------------------------------------
Summary: Handle inode of a given inodeId already deleted
Key: HDFS-13115
URL: https://issues.apache.org/jira/browse/HDFS-13115
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Yongjun Zhang
In LeaseManager,
{code}
private synchronized INode[] getINodesWithLease() {
List<INode> inodes = new ArrayList<>(leasesById.size());
INode currentINode;
for (long inodeId : leasesById.keySet()) {
currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
// A file with an active lease could get deleted, or its
// parent directories could get recursively deleted.
if (currentINode != null &&
currentINode.isFile() &&
!fsnamesystem.isFileDeleted(currentINode.asFile())) {
inodes.add(currentINode);
}
}
return inodes.toArray(new INode[0]);
}
{code}
we can see that given an {{inodeId}},
{{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The
reason is explained in the comment.
HDFS-12985 RCAed a case and solved that case, we saw that it fixes some cases,
but we are still seeing NullPointerException from FSnamesystem
{code}
public long getCompleteBlocksTotal() {
// Calculate number of blocks under construction
long numUCBlocks = 0;
readLock();
try {
numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
return getBlocksTotal() - numUCBlocks;
} finally {
readUnlock();
}
}
{code}
The exception happens when the inode is removed for the given inodeid, see
LeaseManager code below:
{code}
synchronized long getNumUnderConstructionBlocks() {
assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock wasn't"
+ "acquired before counting under construction blocks";
long numUCBlocks = 0;
for (Long id : getINodeIdWithLeases()) {
final INodeFile cons =
fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
Preconditions.checkState(cons.isUnderConstruction());
BlockInfo[] blocks = cons.getBlocks();
if(blocks == null)
continue;
for(BlockInfo b : blocks) {
if(!b.isComplete())
numUCBlocks++;
}
}
LOG.info("Number of blocks under construction: " + numUCBlocks);
return numUCBlocks;
}
{code}
Create this jira to add a check whether the inode is removed, as a safeguard,
to avoid the NullPointerException.
Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it got
deleted from FSDirectory map.
Ideally we should find out who deleted it, like in HDFS-12985.
But it seems reasonable to me to have a safeguard here, like other code that
calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]