[ https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978509#comment-16978509 ]
Yiqun Lin commented on HDFS-14986: ---------------------------------- Hi [~Aiphag0], the patch almost looks good to me. Only some minor comments for the unit test, I think it's enough that we just verify deepCopyReplica call won't throw CME. So while loop in unit test can simplified to this: {code:java} int retryTimes = 3; while (retryTimes > 0) { try { Set<? extends Replica> replicas = spyDataset.deepCopyReplica(bpid); if (replicas.size() > 0) { retryTimes--; // DeepCopyReplica should not throw ConcurrentModificationException modifyThread.setShouldRun(false); } } catch (IOException e) { modifyThread.setShouldRun(false); Assert.fail("Encounter IOException when deep copy replica."); } } modifyThread.setShouldRun(false); {code} In additional, we need to close the fs output stream and delete created test files in ModifyThread. Can you update as following? {code:java} public void run() { FSDataOutputStream os = null; while (shouldRun) { try { int id = RandomUtils.nextInt(); os = fs.create(new Path("/testFsDatasetImplDeepCopyReplica/" + id)); byte[] bytes = new byte[2048]; InputStream is = new ByteArrayInputStream(bytes); IOUtils.copyBytes(is, os, bytes.length); os.hsync(); } catch (IOException e) { // ignored exception } } try { if (os != null) { os.close(); } fs.delete(new Path("/testFsDatasetImplDeepCopyReplica"), true); } catch (IOException e) { } }{code} > ReplicaCachingGetSpaceUsed throws ConcurrentModificationException > ------------------------------------------------------------------ > > Key: HDFS-14986 > URL: https://issues.apache.org/jira/browse/HDFS-14986 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, performance > Reporter: Ryan Wu > Assignee: Ryan Wu > Priority: Major > Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch > > > Running DU across lots of disks is very expensive . We applied the patch > HDFS-14313 to get used space from ReplicaInfo in memory.However, new du > threads throw the exception > {code:java} > // 2019-11-08 18:07:13,858 ERROR > [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992-XXXX-1450855658517] > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed: > ReplicaCachingGetSpaceUsed refresh error > java.util.ConcurrentModificationException: Tree has been modified outside of > iterator > at > org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311) > > at > org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256) > > at java.util.AbstractCollection.addAll(AbstractCollection.java:343) > at java.util.HashSet.<init>(HashSet.java:120) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052) > > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73) > > at > org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178) > > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org