[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505794&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505794 ]
ASF GitHub Bot logged work on HDFS-15643: ----------------------------------------- Author: ASF GitHub Bot Created on: 28/Oct/20 16:25 Start Date: 28/Oct/20 16:25 Worklog Time Spent: 10m Work Description: amahussein commented on a change in pull request #2408: URL: https://github.com/apache/hadoop/pull/2408#discussion_r513584215 ########## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java ########## @@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int range, dnIdxToDie = getDataNodeToKill(filePath); DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie); shutdownDataNode(dnToDie); + // wait enough time for the locations to be updated. + Thread.sleep(STALE_INTERVAL); Review comment: I see. the problem is that I cannot reproduce it on my local machine. However, it seems that it fails in a consistent way on Yetus. If it is not a real bug, I wonder if volumeScanner could be a factor in randomly slowing down the DNs. I see many log message from the volume scanner when I run locally. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 505794) Time Spent: 2h (was: 1h 50m) > TestFileChecksumCompositeCrc fails intermittently > ------------------------------------------------- > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Ahmed Hussein > Assignee: Ahmed Hussein > Priority: Critical > Labels: pull-request-available > Attachments: > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log > > Time Spent: 2h > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > > {code:bash} > Error Message > `/striped/stripedFileChecksum1': Fail to get block checksum for > LocatedStripedBlock{BP-1299291876-172.17.0.3-1602771356932:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:42217,DS-6c29e4b7-e4f1-4302-ad23-fb078f37d783,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41307,DS-3d824f14-3cd0-46b1-bef1-caa808bf278d,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37193,DS-eeb44ff5-fdf1-4774-b6cf-5be7c40147a9,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39897,DS-36d2fbfc-64bc-405c-8360-735f1ad92e30,DISK], > > DatanodeInfoWithStorage[127.0.0.1:35545,DS-6fd42817-efea-416e-92fb-3e9034705142,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39945,DS-501deff8-b6df-4cf0-9ac1-154a4253eec8,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41359,DS-9b0449f5-377b-4a76-9eb6-0bcf2984b4bb,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36123,DS-4184ab4a-079d-4b1c-a8cb-2ba22b0baafb,DISK]]; > indices=[0, 1, 2, 3, 4, 6, 7, 8]} > Stacktrace > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1299291876-172.17.0.3-1602771356932:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:42217,DS-6c29e4b7-e4f1-4302-ad23-fb078f37d783,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41307,DS-3d824f14-3cd0-46b1-bef1-caa808bf278d,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37193,DS-eeb44ff5-fdf1-4774-b6cf-5be7c40147a9,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39897,DS-36d2fbfc-64bc-405c-8360-735f1ad92e30,DISK], > > DatanodeInfoWithStorage[127.0.0.1:35545,DS-6fd42817-efea-416e-92fb-3e9034705142,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39945,DS-501deff8-b6df-4cf0-9ac1-154a4253eec8,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41359,DS-9b0449f5-377b-4a76-9eb6-0bcf2984b4bb,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36123,DS-4184ab4a-079d-4b1c-a8cb-2ba22b0baafb,DISK]]; > indices=[0, 1, 2, 3, 4, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery8(TestFileChecksum.java:388) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org