[ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505794&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505794
 ]

ASF GitHub Bot logged work on HDFS-15643:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Oct/20 16:25
            Start Date: 28/Oct/20 16:25
    Worklog Time Spent: 10m 
      Work Description: amahussein commented on a change in pull request #2408:
URL: https://github.com/apache/hadoop/pull/2408#discussion_r513584215



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java
##########
@@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int 
range,
       dnIdxToDie = getDataNodeToKill(filePath);
       DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie);
       shutdownDataNode(dnToDie);
+      // wait enough time for the locations to be updated.
+      Thread.sleep(STALE_INTERVAL);

Review comment:
       I see. the problem is that I cannot reproduce it on my local machine. 
However, it seems that it fails in a consistent way on Yetus.
   If it is not a real bug, I wonder if volumeScanner could be a factor in 
randomly slowing down the DNs. I see many log message from the volume scanner 
when I run locally. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 505794)
    Time Spent: 2h  (was: 1h 50m)

> TestFileChecksumCompositeCrc fails intermittently
> -------------------------------------------------
>
>                 Key: HDFS-15643
>                 URL: https://issues.apache.org/jira/browse/HDFS-15643
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>       at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>       at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>       at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>       at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>       at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>       at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>       at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> {code:bash}
> Error Message
> `/striped/stripedFileChecksum1': Fail to get block checksum for 
> LocatedStripedBlock{BP-1299291876-172.17.0.3-1602771356932:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:42217,DS-6c29e4b7-e4f1-4302-ad23-fb078f37d783,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41307,DS-3d824f14-3cd0-46b1-bef1-caa808bf278d,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37193,DS-eeb44ff5-fdf1-4774-b6cf-5be7c40147a9,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39897,DS-36d2fbfc-64bc-405c-8360-735f1ad92e30,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:35545,DS-6fd42817-efea-416e-92fb-3e9034705142,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39945,DS-501deff8-b6df-4cf0-9ac1-154a4253eec8,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41359,DS-9b0449f5-377b-4a76-9eb6-0bcf2984b4bb,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36123,DS-4184ab4a-079d-4b1c-a8cb-2ba22b0baafb,DISK]];
>  indices=[0, 1, 2, 3, 4, 6, 7, 8]}
> Stacktrace
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1299291876-172.17.0.3-1602771356932:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:42217,DS-6c29e4b7-e4f1-4302-ad23-fb078f37d783,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41307,DS-3d824f14-3cd0-46b1-bef1-caa808bf278d,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37193,DS-eeb44ff5-fdf1-4774-b6cf-5be7c40147a9,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39897,DS-36d2fbfc-64bc-405c-8360-735f1ad92e30,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:35545,DS-6fd42817-efea-416e-92fb-3e9034705142,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39945,DS-501deff8-b6df-4cf0-9ac1-154a4253eec8,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41359,DS-9b0449f5-377b-4a76-9eb6-0bcf2984b4bb,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36123,DS-4184ab4a-079d-4b1c-a8cb-2ba22b0baafb,DISK]];
>  indices=[0, 1, 2, 3, 4, 6, 7, 8]}
>       at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>       at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>       at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>       at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>       at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>       at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>       at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery8(TestFileChecksum.java:388)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to