[
https://issues.apache.org/jira/browse/HDFS-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143901#comment-15143901
]
Xiao Chen commented on HDFS-9549:
---------------------------------
Below is the excerpt from the log, HTH.
{noformat}
-- cached initially 1073741826 --
2016-02-11 10:58:47,656 INFO datanode.DataNode
(BPOfferService.java:processCommandFromActive(671)) - DatanodeCommand action:
DNA_CACHE for BP-92343436-localhost-1455217122700 of [1073741826, 1073741828,
1073741829, 1073741834]
2016-02-11 10:58:47,669 DEBUG impl.FsDatasetCache
(FsDatasetCache.java:run(486)) - Successfully cached
1073741826_BP-92343436-localhost-1455217122700. We are now caching 4096 bytes
in total.
2016-02-11 10:58:47,673 TRACE namenode.CacheManager
(CacheManager.java:processCacheReportImpl(984)) - Added block
{blockId=1073741826, replication=1, mark=false} to CACHED list.
2016-02-11 10:58:47,674 TRACE namenode.CacheManager
(CacheManager.java:processCacheReportImpl(988)) - Removed block
{blockId=1073741826, replication=1, mark=false} from PENDING_CACHED list.
2016-02-11 10:58:47,674 DEBUG namenode.CacheManager
(CacheManager.java:processCacheReport(953)) - Processed cache report from
DatanodeRegistration(127.0.0.1:54112,
datanodeUuid=1f194c3a-3103-467e-a239-6258d71785a9, infoPort=54114,
infoSecurePort=0, ipcPort=54115,
storageInfo=lv=-56;cid=testClusterID;nsid=513491009;c=1455217122700), blocks:
1, processing time: 2 msecs
-- innocent 1073741841 added --
2016-02-11 10:58:47,758 ERROR blockmanagement.CacheReplicationMonitor
(CacheReplicationMonitor.java:addNewPendingCached(699)) - Adding Block
1073741841: DataNode 1f194c3a-3103-467e-a239-6258d71785a9 because the block has
size 4096, but the DataNode only has 4096 bytes of cache remaining (-12288
pending bytes, 16384 already cached.)
2016-02-11 10:58:47,759 TRACE blockmanagement.CacheReplicationMonitor
(CacheReplicationMonitor.java:addNewPendingCached(711)) - Block 1073741841:
added to PENDING_CACHED on DataNode 1f194c3a-3103-467e-a239-6258d71785a9
2016-02-11 10:58:48,661 INFO datanode.DataNode
(BPOfferService.java:processCommandFromActive(671)) - DatanodeCommand action:
DNA_CACHE for BP-92343436-localhost-1455217122700 of [1073741828, 1073741829,
1073741834, 1073741841]
2016-02-11 10:58:48,667 WARN impl.FsDatasetCache
(FsDatasetCache.java:run(441)) - Failed to cache
1073741841_BP-92343436-localhost-1455217122700: could not reserve 4096 more
bytes in the cache: dfs.datanode.max.locked.memory of 16384 exceeded.
2016-02-11 10:58:48,666 DEBUG impl.FsDatasetCache
(FsDatasetCache.java:cacheBlock(309)) - Initiating caching for Block with id
1073741841, pool BP-92343436-localhost-1455217122700
2016-02-11 10:58:48,668 DEBUG impl.FsDatasetCache
(FsDatasetCache.java:run(499)) - Caching of
1073741841_BP-92343436-localhost-1455217122700 was aborted. We are now caching
only 16384 bytes in total.
-- eventually, 1073741841 fails forever --
2016-02-11 10:58:50,670 DEBUG impl.FsDatasetCache
(FsDatasetCache.java:cacheBlock(309)) - Initiating caching for Block with id
1073741841, pool BP-92343436-localhost-1455217122700
2016-02-11 10:58:50,670 WARN impl.FsDatasetCache
(FsDatasetCache.java:run(441)) - Failed to cache
1073741841_BP-92343436-localhost-1455217122700: could not reserve 4096 more
bytes in the cache: dfs.datanode.max.locked.memory of 16384 exceeded.
2016-02-11 10:58:50,672 DEBUG impl.FsDatasetCache
(FsDatasetCache.java:run(499)) - Caching of
1073741841_BP-92343436-localhost-1455217122700 was aborted. We are now caching
only 16384 bytes in total.
....
-- and cache reports show this --
2016-02-11 10:58:49,675 TRACE namenode.CacheManager
(CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode
127.0.0.1:54112 has block 1073741826
2016-02-11 10:58:49,675 TRACE namenode.CacheManager
(CacheManager.java:processCacheReportImpl(984)) - Added block
{blockId=1073741826, replication=1, mark=false} to CACHED list.
2016-02-11 10:58:49,675 TRACE namenode.CacheManager
(CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode
127.0.0.1:54112 has block 1073741829
2016-02-11 10:58:49,676 TRACE namenode.CacheManager
(CacheManager.java:processCacheReportImpl(984)) - Added block
{blockId=1073741829, replication=1, mark=false} to CACHED list.
2016-02-11 10:58:49,676 TRACE namenode.CacheManager
(CacheManager.java:processCacheReportImpl(988)) - Removed block
{blockId=1073741829, replication=1, mark=false} from PENDING_CACHED list.
2016-02-11 10:58:49,676 TRACE namenode.CacheManager
(CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode
127.0.0.1:54112 has block 1073741828
2016-02-11 10:58:49,676 TRACE namenode.CacheManager
(CacheManager.java:processCacheReportImpl(984)) - Added block
{blockId=1073741828, replication=1, mark=false} to CACHED list.
2016-02-11 10:58:49,676 TRACE namenode.CacheManager
(CacheManager.java:processCacheReportImpl(988)) - Removed block
{blockId=1073741828, replication=1, mark=false} from PENDING_CACHED list.
2016-02-11 10:58:49,677 TRACE namenode.CacheManager
(CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode
127.0.0.1:54112 has block 1073741834
2016-02-11 10:58:49,677 TRACE namenode.CacheManager
(CacheManager.java:processCacheReportImpl(984)) - Added block
{blockId=1073741834, replication=1, mark=false} to CACHED list.
2016-02-11 10:58:49,677 TRACE namenode.CacheManager
(CacheManager.java:processCacheReportImpl(988)) - Removed block
{blockId=1073741834, replication=1, mark=false} from PENDING_CACHED list.
{noformat}
> TestCacheDirectives#testExceedsCapacity is flaky
> ------------------------------------------------
>
> Key: HDFS-9549
> URL: https://issues.apache.org/jira/browse/HDFS-9549
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.0.0
> Environment: Jenkins
> Reporter: Wei-Chiu Chuang
> Assignee: Xiao Chen
> Attachments: HDFS-9549.01.patch
>
>
> I have observed that this test (TestCacheDirectives.testExceedsCapacity)
> fails quite frequently in Jenkins (trunk, trunk-Java8)
> Error Message
> Pending cached list of 127.0.0.1:54134 is not empty, [{blockId=1073741841,
> replication=1, mark=true}]
> Stacktrace
> java.lang.AssertionError: Pending cached list of 127.0.0.1:54134 is not
> empty, [{blockId=1073741841, replication=1, mark=true}]
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1479)
> at
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1502)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)