[ 
https://issues.apache.org/jira/browse/HDFS-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-9549:
----------------------------
    Attachment: HDFS-9549.02.patch

I have talked with Colin offline, and my comment above misinterpreted his 
intent. He just meant to have the similar fix in a more optimized way. Sorry I 
misunderstood earlier. 

Patch 2 attached tries to remove the pendingCached blocks by first going 
through DNs, and ignoring the DNs that's not reached capacity watermark. 
The watermark is hardcoded - IMHO configuration would be an overkill.

One thing I think worth mentioning is that, due to the nature of the race, it 
is also possible that a block is in fact {{CACHED}}, but not yet removed from 
{{PENDING_CACHED}}. If the DN is beyond watermark, we may remove that early due 
to the added logic. I don't think we need special handling on that, since the 
state is still correct, just the removal happens in CRM instead of cache 
reporting.

> TestCacheDirectives#testExceedsCapacity is flaky
> ------------------------------------------------
>
>                 Key: HDFS-9549
>                 URL: https://issues.apache.org/jira/browse/HDFS-9549
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>         Environment: Jenkins
>            Reporter: Wei-Chiu Chuang
>            Assignee: Xiao Chen
>              Labels: unittest
>         Attachments: HDFS-9549.01.patch, HDFS-9549.02.patch, 
> TestCacheDirectives.rtf
>
>
> I have observed that this test (TestCacheDirectives.testExceedsCapacity) 
> fails quite frequently in Jenkins (trunk, trunk-Java8)  
> Error Message
> Pending cached list of 127.0.0.1:54134 is not empty, [{blockId=1073741841, 
> replication=1, mark=true}]
> Stacktrace
> java.lang.AssertionError: Pending cached list of 127.0.0.1:54134 is not 
> empty, [{blockId=1073741841, replication=1, mark=true}]
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.assertTrue(Assert.java:41)
>       at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1479)
>       at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1502)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to