[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245569#comment-15245569 ]
Walter Su commented on HDFS-10275: ---------------------------------- sorry I didn't see that. The patch LGTM. +1. > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > ------------------------------------------------------------------------------------ > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test > Reporter: Lin Yiqun > Assignee: Lin Yiqun > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected:<false> but was:<true> > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)