[
https://issues.apache.org/jira/browse/HDFS-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259458#comment-16259458
]
Erik Krogen edited comment on HDFS-12832 at 11/20/17 4:42 PM:
--------------------------------------------------------------
Thanks for reporting this and for working on a patch, [~Deng FEI]! Actually,
the new method you are using, {{INode#getPathComponents()}} is subject to the
same race condition. Generally {{INode}} is not meant to be a concurrent data
structure as far as I can tell. I believe the issue is actually that
{{ReplicationWork#chooseTargets()}} is being called without a lock:
{code:title=BlockManager.ReplicationWork}
// choose replication targets: NOT HOLDING THE GLOBAL LOCK
// It is costly to extract the filename for which chooseTargets is called,
// so for now we pass in the block collection itself.
rw.chooseTargets(blockplacement, storagePolicySuite, excludedNodes);
{code}
Within {{chooseTargets()}} various methods on {{INode}}/{{BlockCollection}},
{{DatanodeDescriptor}}, {{DatanodeStorageInfo}}, and {{Block}} are called which
it seems should not be allowed outside of the lock.
[~Deng FEI], do you have a stack trace available to confirm that this is the
same code path which caused your exception? This is the code path that was
taken to trigger the issue for us.
was (Author: xkrogen):
Thanks for reporting this and for working on a patch, [~Deng FEI]]! Actually,
the new method you are using, {{INode#getPathComponents()}} is subject to the
same race condition. Generally {{INode}} is not meant to be a concurrent data
structure as far as I can tell. I believe the issue is actually that
{{ReplicationWork#chooseTargets()}} is being called without a lock:
{code:title=BlockManager.ReplicationWork}
// choose replication targets: NOT HOLDING THE GLOBAL LOCK
// It is costly to extract the filename for which chooseTargets is called,
// so for now we pass in the block collection itself.
rw.chooseTargets(blockplacement, storagePolicySuite, excludedNodes);
{code}
Within {{chooseTargets()}} various methods on {{INode}}/{{BlockCollection}},
{{DatanodeDescriptor}}, {{DatanodeStorageInfo}}, and {{Block}} are called which
it seems should not be allowed outside of the lock.
[~Deng FEI], do you have a stack trace available to confirm that this is the
same code path which caused your exception? This is the code path that was
taken to trigger the issue for us.
> INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to
> NameNode exit
> ------------------------------------------------------------------------------------
>
> Key: HDFS-12832
> URL: https://issues.apache.org/jira/browse/HDFS-12832
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.7.4, 3.0.0-beta1
> Reporter: DENG FEI
> Priority: Critical
> Attachments: HDFS-12832-trunk-001.patch
>
>
> {code:title=INode.java|borderStyle=solid}
> public String getFullPathName() {
> // Get the full path name of this inode.
> if (isRoot()) {
> return Path.SEPARATOR;
> }
> // compute size of needed bytes for the path
> int idx = 0;
> for (INode inode = this; inode != null; inode = inode.getParent()) {
> // add component + delimiter (if not tail component)
> idx += inode.getLocalNameBytes().length + (inode != this ? 1 : 0);
> }
> byte[] path = new byte[idx];
> for (INode inode = this; inode != null; inode = inode.getParent()) {
> if (inode != this) {
> path[--idx] = Path.SEPARATOR_CHAR;
> }
> byte[] name = inode.getLocalNameBytes();
> idx -= name.length;
> System.arraycopy(name, 0, path, idx, name.length);
> }
> return DFSUtil.bytes2String(path);
> }
> {code}
> We found ArrayIndexOutOfBoundsException at
> _{color:#707070}System.arraycopy(name, 0, path, idx, name.length){color}_
> when ReplicaMonitor work ,and the NameNode will quit.
> It seems the two loop is not synchronized, the path's length is changed.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]