[
https://issues.apache.org/jira/browse/HDFS-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147688#comment-15147688
]
Xiao Chen commented on HDFS-9549:
---------------------------------
Thank you for the comment, and nicely summarizing the root cause [~cmccabe]!
You definitely summarized it better than I did. :)
bq. Rather than doing that, it would be simpler and more efficient to just loop
over all datanodes and make sure that pendingCached only contained blocks that
we could realistically hope to cache
Could you further explain this? IIUC you're proposing to add the remove logic
into DN's thread, instead of the
{{CacheReplicationMonitor#rescanCachedBlockMap}}? I think there're 2 things we
want to remove - the block from DN's pendingCached block, and the DN from
{{cachedBlocks}}'s pendingCached list in the cache manager.
In the remove code, that's
{code}
datanode.getPendingCached().remove(cblock); // remove from the DN
iter.remove(); // remove the DN from the list of pendingCached
DNs of that block from the cache manager.
{code}
I didn't find how to remove the latter in a DN context. Please advice. Thanks
again!
> TestCacheDirectives#testExceedsCapacity is flaky
> ------------------------------------------------
>
> Key: HDFS-9549
> URL: https://issues.apache.org/jira/browse/HDFS-9549
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.0.0
> Environment: Jenkins
> Reporter: Wei-Chiu Chuang
> Assignee: Xiao Chen
> Labels: unittest
> Attachments: HDFS-9549.01.patch, TestCacheDirectives.rtf
>
>
> I have observed that this test (TestCacheDirectives.testExceedsCapacity)
> fails quite frequently in Jenkins (trunk, trunk-Java8)
> Error Message
> Pending cached list of 127.0.0.1:54134 is not empty, [{blockId=1073741841,
> replication=1, mark=true}]
> Stacktrace
> java.lang.AssertionError: Pending cached list of 127.0.0.1:54134 is not
> empty, [{blockId=1073741841, replication=1, mark=true}]
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1479)
> at
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1502)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)