[
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yongjun Zhang reassigned HDFS-13115:
------------------------------------
Assignee: Yongjun Zhang
> Handle inode of a given inodeId already deleted
> -----------------------------------------------
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Yongjun Zhang
> Assignee: Yongjun Zhang
> Priority: Major
>
> In LeaseManager,
> {code}
> private synchronized INode[] getINodesWithLease() {
> List<INode> inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
> currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
> // A file with an active lease could get deleted, or its
> // parent directories could get recursively deleted.
> if (currentINode != null &&
> currentINode.isFile() &&
> !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
> }
> }
> return inodes.toArray(new INode[0]);
> }
> {code}
> we can see that given an {{inodeId}},
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
> public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
> numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
> return getBlocksTotal() - numUCBlocks;
> } finally {
> readUnlock();
> }
> }
> {code}
> The exception happens when the inode is removed for the given inodeid, see
> LeaseManager code below:
> {code}
> synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock
> wasn't"
> + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
> final INodeFile cons =
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
> Preconditions.checkState(cons.isUnderConstruction());
> BlockInfo[] blocks = cons.getBlocks();
> if(blocks == null)
> continue;
> for(BlockInfo b : blocks) {
> if(!b.isComplete())
> numUCBlocks++;
> }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
> }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard,
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985.
> But it seems reasonable to me to have a safeguard here, like other code that
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]