[ 
https://issues.apache.org/jira/browse/HDFS-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143901#comment-15143901
 ] 

Xiao Chen commented on HDFS-9549:
---------------------------------

Below is the excerpt from the log, HTH.
{noformat}
-- cached initially 1073741826 --
2016-02-11 10:58:47,656 INFO  datanode.DataNode 
(BPOfferService.java:processCommandFromActive(671)) - DatanodeCommand action: 
DNA_CACHE for BP-92343436-localhost-1455217122700 of [1073741826, 1073741828, 
1073741829, 1073741834]
2016-02-11 10:58:47,669 DEBUG impl.FsDatasetCache 
(FsDatasetCache.java:run(486)) - Successfully cached 
1073741826_BP-92343436-localhost-1455217122700.  We are now caching 4096 bytes 
in total.
2016-02-11 10:58:47,673 TRACE namenode.CacheManager 
(CacheManager.java:processCacheReportImpl(984)) - Added block 
{blockId=1073741826, replication=1, mark=false} to CACHED list.
2016-02-11 10:58:47,674 TRACE namenode.CacheManager 
(CacheManager.java:processCacheReportImpl(988)) - Removed block 
{blockId=1073741826, replication=1, mark=false} from PENDING_CACHED list.
2016-02-11 10:58:47,674 DEBUG namenode.CacheManager 
(CacheManager.java:processCacheReport(953)) - Processed cache report from 
DatanodeRegistration(127.0.0.1:54112, 
datanodeUuid=1f194c3a-3103-467e-a239-6258d71785a9, infoPort=54114, 
infoSecurePort=0, ipcPort=54115, 
storageInfo=lv=-56;cid=testClusterID;nsid=513491009;c=1455217122700), blocks: 
1, processing time: 2 msecs

-- innocent 1073741841 added --
2016-02-11 10:58:47,758 ERROR blockmanagement.CacheReplicationMonitor 
(CacheReplicationMonitor.java:addNewPendingCached(699)) - Adding Block 
1073741841: DataNode 1f194c3a-3103-467e-a239-6258d71785a9 because the block has 
size 4096, but the DataNode only has 4096 bytes of cache remaining (-12288 
pending bytes, 16384 already cached.)
2016-02-11 10:58:47,759 TRACE blockmanagement.CacheReplicationMonitor 
(CacheReplicationMonitor.java:addNewPendingCached(711)) - Block 1073741841: 
added to PENDING_CACHED on DataNode 1f194c3a-3103-467e-a239-6258d71785a9
2016-02-11 10:58:48,661 INFO  datanode.DataNode 
(BPOfferService.java:processCommandFromActive(671)) - DatanodeCommand action: 
DNA_CACHE for BP-92343436-localhost-1455217122700 of [1073741828, 1073741829, 
1073741834, 1073741841]
2016-02-11 10:58:48,667 WARN  impl.FsDatasetCache 
(FsDatasetCache.java:run(441)) - Failed to cache 
1073741841_BP-92343436-localhost-1455217122700: could not reserve 4096 more 
bytes in the cache: dfs.datanode.max.locked.memory of 16384 exceeded.
2016-02-11 10:58:48,666 DEBUG impl.FsDatasetCache 
(FsDatasetCache.java:cacheBlock(309)) - Initiating caching for Block with id 
1073741841, pool BP-92343436-localhost-1455217122700
2016-02-11 10:58:48,668 DEBUG impl.FsDatasetCache 
(FsDatasetCache.java:run(499)) - Caching of 
1073741841_BP-92343436-localhost-1455217122700 was aborted.  We are now caching 
only 16384 bytes in total.

-- eventually, 1073741841 fails forever --
2016-02-11 10:58:50,670 DEBUG impl.FsDatasetCache 
(FsDatasetCache.java:cacheBlock(309)) - Initiating caching for Block with id 
1073741841, pool BP-92343436-localhost-1455217122700
2016-02-11 10:58:50,670 WARN  impl.FsDatasetCache 
(FsDatasetCache.java:run(441)) - Failed to cache 
1073741841_BP-92343436-localhost-1455217122700: could not reserve 4096 more 
bytes in the cache: dfs.datanode.max.locked.memory of 16384 exceeded.
2016-02-11 10:58:50,672 DEBUG impl.FsDatasetCache 
(FsDatasetCache.java:run(499)) - Caching of 
1073741841_BP-92343436-localhost-1455217122700 was aborted.  We are now caching 
only 16384 bytes in total.
 ....

-- and cache reports show this --
2016-02-11 10:58:49,675 TRACE namenode.CacheManager 
(CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode 
127.0.0.1:54112 has block 1073741826
2016-02-11 10:58:49,675 TRACE namenode.CacheManager 
(CacheManager.java:processCacheReportImpl(984)) - Added block 
{blockId=1073741826, replication=1, mark=false} to CACHED list.
2016-02-11 10:58:49,675 TRACE namenode.CacheManager 
(CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode 
127.0.0.1:54112 has block 1073741829
2016-02-11 10:58:49,676 TRACE namenode.CacheManager 
(CacheManager.java:processCacheReportImpl(984)) - Added block 
{blockId=1073741829, replication=1, mark=false} to CACHED list.
2016-02-11 10:58:49,676 TRACE namenode.CacheManager 
(CacheManager.java:processCacheReportImpl(988)) - Removed block 
{blockId=1073741829, replication=1, mark=false} from PENDING_CACHED list.
2016-02-11 10:58:49,676 TRACE namenode.CacheManager 
(CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode 
127.0.0.1:54112 has block 1073741828
2016-02-11 10:58:49,676 TRACE namenode.CacheManager 
(CacheManager.java:processCacheReportImpl(984)) - Added block 
{blockId=1073741828, replication=1, mark=false} to CACHED list.
2016-02-11 10:58:49,676 TRACE namenode.CacheManager 
(CacheManager.java:processCacheReportImpl(988)) - Removed block 
{blockId=1073741828, replication=1, mark=false} from PENDING_CACHED list.
2016-02-11 10:58:49,677 TRACE namenode.CacheManager 
(CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode 
127.0.0.1:54112 has block 1073741834
2016-02-11 10:58:49,677 TRACE namenode.CacheManager 
(CacheManager.java:processCacheReportImpl(984)) - Added block 
{blockId=1073741834, replication=1, mark=false} to CACHED list.
2016-02-11 10:58:49,677 TRACE namenode.CacheManager 
(CacheManager.java:processCacheReportImpl(988)) - Removed block 
{blockId=1073741834, replication=1, mark=false} from PENDING_CACHED list.
{noformat}

> TestCacheDirectives#testExceedsCapacity is flaky
> ------------------------------------------------
>
>                 Key: HDFS-9549
>                 URL: https://issues.apache.org/jira/browse/HDFS-9549
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>         Environment: Jenkins
>            Reporter: Wei-Chiu Chuang
>            Assignee: Xiao Chen
>         Attachments: HDFS-9549.01.patch
>
>
> I have observed that this test (TestCacheDirectives.testExceedsCapacity) 
> fails quite frequently in Jenkins (trunk, trunk-Java8)  
> Error Message
> Pending cached list of 127.0.0.1:54134 is not empty, [{blockId=1073741841, 
> replication=1, mark=true}]
> Stacktrace
> java.lang.AssertionError: Pending cached list of 127.0.0.1:54134 is not 
> empty, [{blockId=1073741841, replication=1, mark=true}]
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.assertTrue(Assert.java:41)
>       at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1479)
>       at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1502)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to