[
https://issues.apache.org/jira/browse/HDFS-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259458#comment-16259458
]
Erik Krogen commented on HDFS-12832:
------------------------------------
Thanks for reporting this and for working on a patch, [~Deng FEI]]! Actually,
the new method you are using, {{INode#getPathComponents()}} is subject to the
same race condition. Generally {{INode}} is not meant to be a concurrent data
structure as far as I can tell. I believe the issue is actually that
{{ReplicationWork#chooseTargets()}} is being called without a lock:
{code:title=BlockManager.ReplicationWork}
// choose replication targets: NOT HOLDING THE GLOBAL LOCK
// It is costly to extract the filename for which chooseTargets is called,
// so for now we pass in the block collection itself.
rw.chooseTargets(blockplacement, storagePolicySuite, excludedNodes);
{code}
Within {{chooseTargets()}} various methods on {{INode}}/{{BlockCollection}},
{{DatanodeDescriptor}}, {{DatanodeStorageInfo}}, and {{Block}} are called which
it seems should not be allowed outside of the lock.
[~Deng FEI], do you have a stack trace available to confirm that this is the
same code path which caused your exception? This is the code path that was
taken to trigger the issue for us.
> INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to
> NameNode exit
> ------------------------------------------------------------------------------------
>
> Key: HDFS-12832
> URL: https://issues.apache.org/jira/browse/HDFS-12832
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.7.4, 3.0.0-beta1
> Reporter: DENG FEI
> Priority: Critical
> Attachments: HDFS-12832-trunk-001.patch
>
>
> {code:title=INode.java|borderStyle=solid}
> public String getFullPathName() {
> // Get the full path name of this inode.
> if (isRoot()) {
> return Path.SEPARATOR;
> }
> // compute size of needed bytes for the path
> int idx = 0;
> for (INode inode = this; inode != null; inode = inode.getParent()) {
> // add component + delimiter (if not tail component)
> idx += inode.getLocalNameBytes().length + (inode != this ? 1 : 0);
> }
> byte[] path = new byte[idx];
> for (INode inode = this; inode != null; inode = inode.getParent()) {
> if (inode != this) {
> path[--idx] = Path.SEPARATOR_CHAR;
> }
> byte[] name = inode.getLocalNameBytes();
> idx -= name.length;
> System.arraycopy(name, 0, path, idx, name.length);
> }
> return DFSUtil.bytes2String(path);
> }
> {code}
> We found ArrayIndexOutOfBoundsException at
> _{color:#707070}System.arraycopy(name, 0, path, idx, name.length){color}_
> when ReplicaMonitor work ,and the NameNode will quit.
> It seems the two loop is not synchronized, the path's length is changed.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]