[ 
https://issues.apache.org/jira/browse/HDFS-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209669#comment-15209669
 ] 

Lin Yiqun commented on HDFS-10197:
----------------------------------

Thanks [~andrew.wang] for comments.
{quote}
However, do you have any ideas about why the tests take so long to run? A 
better solution is to optimize the tests to run faster.
{quote}
I analysed some of this, the three places are all due to timeout of waitting 
for cache blocks and its cache used. It indicated that sometimes caching blocks 
is slow. One comment from me:

* Now the config {{dfs.datanode.fsdatasetcache.max.threads.per.volume}} is 
default set as 4. when more than 4 blocks are caching at the same time in unit 
tests(Like in {{testPageRounder}}, numblocks is 5), it seems some will be 
waitting. We can adjust this value to a bigger value.

Update the latest patch for addressing comments, pending jenkins.

> TestFsDatasetCache failing intermittently due to timeout
> --------------------------------------------------------
>
>                 Key: HDFS-10197
>                 URL: https://issues.apache.org/jira/browse/HDFS-10197
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>            Reporter: Lin Yiqun
>            Assignee: Lin Yiqun
>         Attachments: HDFS-10197.001.patch
>
>
> In {{TestFsDatasetCache}}, the unit tests failed sometimes. I collected some 
> failed reason in recent jenkins reports. They are all timeout errors.
> {code}
> Tests in error: 
>   TestFsDatasetCache.testFilesExceedMaxLockedMemory:378 ? Timeout Timed out 
> wait...
>   TestFsDatasetCache.tearDown:149 ? Timeout Timed out waiting for condition. 
> Thr...
> {code}
> {code}
> Tests in error: 
>   TestFsDatasetCache.testPageRounder:474 ?  test timed out after 60000 
> milliseco...
>   TestBalancer.testUnknownDatanodeSimple:1040->testUnknownDatanode:1098 ?  
> test ...
> {code}
> But there was a little different between these failure.
> * The first because the total block time was exceed the 
> {{waitTimeMillis}}(here is 60s)  then throw the timeout exception and print 
> thread diagnostic string in method {{DFSTestUtil#verifyExpectedCacheUsage}}.
> {code}
>     long st = Time.now();
>     do {
>       boolean result = check.get();
>       if (result) {
>         return;
>       }
>       
>       Thread.sleep(checkEveryMillis);
>     } while (Time.now() - st < waitForMillis);
>     
>     throw new TimeoutException("Timed out waiting for condition. " +
>         "Thread diagnostics:\n" +
>         TimedOutTestsListener.buildThreadDiagnosticString());
> {code}
> * The second is due to test elapsed time more than timeout time setting. Like 
> in {{TestFsDatasetCache#testPageRounder}}.
> We should adjust timeout time for these unit test which would failed 
> sometimes due to timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to