[ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245569#comment-15245569
 ] 

Walter Su commented on HDFS-10275:
----------------------------------

sorry I didn't see that. The patch LGTM. +1.

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-10275
>                 URL: https://issues.apache.org/jira/browse/HDFS-10275
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>            Reporter: Lin Yiqun
>            Assignee: Lin Yiqun
>         Attachments: HDFS-10275.001.patch
>
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected:<false> but was:<true>
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
>     @Override
>     public void write(byte[] b,
>               int off,
>               int len) throws IOException  {
>       length += len;
>     }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to