[
https://issues.apache.org/jira/browse/HDFS-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951783#comment-15951783
]
Wei-Chiu Chuang commented on HDFS-11515:
----------------------------------------
The latest patch is good, and I just saw a few nits:
In the unit test, please make sure a test has a timeout.
{code}
@Test//(timeout = 180000)
{code}
Also, would you mind to move the test to TestSnapshotDeletion.java? It's really
my fault to place it in TestRenamefWithSnapshots.
+1 after that.
P.S. This is not related to your patch, but while reviewing the patch, I think
the synchronization block in the following code is not needed at all and it
only introduces extra overhead. I'll post a jira soon.
{code}
public boolean nodeIncluded(INode node) {
INode resolvedNode = resolveINodeReference(node);
synchronized (includedNodes) {
if (!includedNodes.contains(resolvedNode)) {
includedNodes.add(resolvedNode);
return false;
}
}
return true;
}
{code}
> -du throws ConcurrentModificationException
> ------------------------------------------
>
> Key: HDFS-11515
> URL: https://issues.apache.org/jira/browse/HDFS-11515
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode, shell
> Affects Versions: 2.8.0, 3.0.0-alpha2
> Reporter: Wei-Chiu Chuang
> Assignee: Istvan Fajth
> Attachments: HDFS-11515.001.patch, HDFS-11515.002.patch,
> HDFS-11515.003.patch, HDFS-11515.test.patch
>
>
> HDFS-10797 fixed a disk summary (-du) bug, but it introduced a new bug.
> The bug can be reproduced running the following commands:
> {noformat}
> bash-4.1$ hdfs dfs -mkdir /tmp/d0
> bash-4.1$ hdfs dfsadmin -allowSnapshot /tmp/d0
> Allowing snaphot on /tmp/d0 succeeded
> bash-4.1$ hdfs dfs -touchz /tmp/d0/f4
> bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1
> bash-4.1$ hdfs dfs -createSnapshot /tmp/d0 s1
> Created snapshot /tmp/d0/.snapshot/s1
> bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d2
> bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d3
> bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d2/d4
> bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d3/d5
> bash-4.1$ hdfs dfs -createSnapshot /tmp/d0 s2
> Created snapshot /tmp/d0/.snapshot/s2
> bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d2/d4
> bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d2
> bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d3/d5
> bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d3
> bash-4.1$ hdfs dfs -du -h /tmp/d0
> du: java.util.ConcurrentModificationException
> 0 0 /tmp/d0/f4
> {noformat}
> A ConcurrentModificationException forced du to terminate abruptly.
> Correspondingly, NameNode log has the following error:
> {noformat}
> 2017-03-08 14:32:17,673 WARN org.apache.hadoop.ipc.Server: IPC Server handler
> 4 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getContentSumma
> ry from 10.0.0.198:49957 Call#2 Retry#0
> java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922)
> at java.util.HashMap$KeyIterator.next(HashMap.java:956)
> at
> org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.tallyDeletedSnapshottedINodes(ContentSummaryComputationContext.java:209)
> at
> org.apache.hadoop.hdfs.server.namenode.INode.computeAndConvertContentSummary(INode.java:507)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getContentSummary(FSDirectory.java:2302)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getContentSummary(FSNamesystem.java:4535)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getContentSummary(NameNodeRpcServer.java:1087)
> at
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getContentSummary(AuthorizationProviderProxyClientProtocol.java:5
> 63)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getContentSummary(ClientNamenodeProtocolServerSideTranslatorPB.jav
> a:873)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)
> {noformat}
> The bug is due to a improper use of HashSet, not concurrent operations.
> Basically, a HashSet can not be updated while an iterator is traversing it.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]