[
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978509#comment-16978509
]
Yiqun Lin edited comment on HDFS-14986 at 11/20/19 3:27 PM:
------------------------------------------------------------
Hi [~Aiphag0], the patch almost looks good to me. Only some minor comments for
the unit test, I think it's enough that we just verify deepCopyReplica call
won't throw CME.
So while loop in unit test can simplified to this:
{code:java}
int retryTimes = 3;
while (retryTimes > 0) {
try {
Set<? extends Replica> replicas = spyDataset.deepCopyReplica(bpid);
if (replicas.size() > 0) {
retryTimes--;
// DeepCopyReplica should not throw
ConcurrentModificationException
modifyThread.setShouldRun(false);
}
} catch (IOException e) {
modifyThread.setShouldRun(false);
Assert.fail("Encounter IOException when deep copy replica.");
}
}
modifyThread.setShouldRun(false);
{code}
In additional, we need to close the fs output stream and delete created test
files in ModifyThread. Can you update as following?
{code:java}
public void run() {
FSDataOutputStream os = null;
while (shouldRun) {
try {
int id = RandomUtils.nextInt();
os = fs.create(new Path("/testFsDatasetImplDeepCopyReplica/" + id));
byte[] bytes = new byte[2048];
InputStream is = new ByteArrayInputStream(bytes);
IOUtils.copyBytes(is, os, bytes.length);
os.hsync();
} catch (IOException e) {
// ignored exception
}
}
try {
if (os != null) {
os.close();
}
fs.delete(new Path("/testFsDatasetImplDeepCopyReplica"), true);
} catch (IOException e) {
}
}{code}
Also please check following two failed unit tests if they are related:
* TestDU
* TestDFCachingGetSpaceUsed
was (Author: linyiqun):
Hi [~Aiphag0], the patch almost looks good to me. Only some minor comments for
the unit test, I think it's enough that we just verify deepCopyReplica call
won't throw CME.
So while loop in unit test can simplified to this:
{code:java}
int retryTimes = 3;
while (retryTimes > 0) {
try {
Set<? extends Replica> replicas = spyDataset.deepCopyReplica(bpid);
if (replicas.size() > 0) {
retryTimes--;
// DeepCopyReplica should not throw
ConcurrentModificationException
modifyThread.setShouldRun(false);
}
} catch (IOException e) {
modifyThread.setShouldRun(false);
Assert.fail("Encounter IOException when deep copy replica.");
}
}
modifyThread.setShouldRun(false);
{code}
In additional, we need to close the fs output stream and delete created test
files in ModifyThread. Can you update as following?
{code:java}
public void run() {
FSDataOutputStream os = null;
while (shouldRun) {
try {
int id = RandomUtils.nextInt();
os = fs.create(new Path("/testFsDatasetImplDeepCopyReplica/" + id));
byte[] bytes = new byte[2048];
InputStream is = new ByteArrayInputStream(bytes);
IOUtils.copyBytes(is, os, bytes.length);
os.hsync();
} catch (IOException e) {
// ignored exception
}
}
try {
if (os != null) {
os.close();
}
fs.delete(new Path("/testFsDatasetImplDeepCopyReplica"), true);
} catch (IOException e) {
}
}{code}
> ReplicaCachingGetSpaceUsed throws ConcurrentModificationException
> ------------------------------------------------------------------
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, performance
> Reporter: Ryan Wu
> Assignee: Ryan Wu
> Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch
> HDFS-14313 to get used space from ReplicaInfo in memory.However, new du
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992-XXXX-1450855658517]
>
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
> ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of
> iterator
> at
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
>
> at
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
>
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.<init>(HashSet.java:120)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
>
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
>
> at
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]