[
https://issues.apache.org/jira/browse/HDFS-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiao Chen updated HDFS-9549:
----------------------------
Attachment: HDFS-9549.02.patch
I have talked with Colin offline, and my comment above misinterpreted his
intent. He just meant to have the similar fix in a more optimized way. Sorry I
misunderstood earlier.
Patch 2 attached tries to remove the pendingCached blocks by first going
through DNs, and ignoring the DNs that's not reached capacity watermark.
The watermark is hardcoded - IMHO configuration would be an overkill.
One thing I think worth mentioning is that, due to the nature of the race, it
is also possible that a block is in fact {{CACHED}}, but not yet removed from
{{PENDING_CACHED}}. If the DN is beyond watermark, we may remove that early due
to the added logic. I don't think we need special handling on that, since the
state is still correct, just the removal happens in CRM instead of cache
reporting.
> TestCacheDirectives#testExceedsCapacity is flaky
> ------------------------------------------------
>
> Key: HDFS-9549
> URL: https://issues.apache.org/jira/browse/HDFS-9549
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.0.0
> Environment: Jenkins
> Reporter: Wei-Chiu Chuang
> Assignee: Xiao Chen
> Labels: unittest
> Attachments: HDFS-9549.01.patch, HDFS-9549.02.patch,
> TestCacheDirectives.rtf
>
>
> I have observed that this test (TestCacheDirectives.testExceedsCapacity)
> fails quite frequently in Jenkins (trunk, trunk-Java8)
> Error Message
> Pending cached list of 127.0.0.1:54134 is not empty, [{blockId=1073741841,
> replication=1, mark=true}]
> Stacktrace
> java.lang.AssertionError: Pending cached list of 127.0.0.1:54134 is not
> empty, [{blockId=1073741841, replication=1, mark=true}]
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1479)
> at
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1502)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)