[ 
https://issues.apache.org/jira/browse/HDFS-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147688#comment-15147688
 ] 

Xiao Chen commented on HDFS-9549:
---------------------------------

Thank you for the comment, and nicely summarizing the root cause [~cmccabe]! 
You definitely summarized it better than I did. :)
bq. Rather than doing that, it would be simpler and more efficient to just loop 
over all datanodes and make sure that pendingCached only contained blocks that 
we could realistically hope to cache
Could you further explain this? IIUC you're proposing to add the remove logic 
into DN's thread, instead of the 
{{CacheReplicationMonitor#rescanCachedBlockMap}}? I think there're 2 things we 
want to remove - the block from DN's pendingCached block, and the DN from 
{{cachedBlocks}}'s pendingCached list in the cache manager.
In the remove code, that's 
{code}
          datanode.getPendingCached().remove(cblock);    // remove from the DN
          iter.remove();      // remove the DN from the list of pendingCached 
DNs of that block from the cache manager.
{code}
I didn't find how to remove the latter in a DN context. Please advice. Thanks 
again!

> TestCacheDirectives#testExceedsCapacity is flaky
> ------------------------------------------------
>
>                 Key: HDFS-9549
>                 URL: https://issues.apache.org/jira/browse/HDFS-9549
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>         Environment: Jenkins
>            Reporter: Wei-Chiu Chuang
>            Assignee: Xiao Chen
>              Labels: unittest
>         Attachments: HDFS-9549.01.patch, TestCacheDirectives.rtf
>
>
> I have observed that this test (TestCacheDirectives.testExceedsCapacity) 
> fails quite frequently in Jenkins (trunk, trunk-Java8)  
> Error Message
> Pending cached list of 127.0.0.1:54134 is not empty, [{blockId=1073741841, 
> replication=1, mark=true}]
> Stacktrace
> java.lang.AssertionError: Pending cached list of 127.0.0.1:54134 is not 
> empty, [{blockId=1073741841, replication=1, mark=true}]
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.assertTrue(Assert.java:41)
>       at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1479)
>       at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1502)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to