[ https://issues.apache.org/jira/browse/HDFS-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008532#comment-13008532 ]
Lars Ailo Bongo commented on HDFS-1768: --------------------------------------- In case my previous comment was unclear. I believe the following caused the error: 1. I did a copyFromLocalFile that crashed after creating the checksum file, but before deleting the file 2. The content of stats-test.txt was changed such that the new checksum does not match the checksum in the old non-deleted checksum file. 3. Subsequent copyFromLocalFile uses the old checksum file Something related happens if the checksum file is invalid, as in: /home/larsab/troilkatt2/test-tmp/data>cat > .status-test.txt.crc dsds dsdsdsd /home/larsab/troilkatt2/test-tmp/data>hadoop fs -put status-test.txt foo7 11/03/18 18:28:00 WARN fs.FSInputChecker: Problem opening checksum file: status-test.txt. Ignoring exception: java.io.IOException: Not a checksum file: .status-test.txt.crc at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:284) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:222) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:170) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1283) at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:134) at org.apache.hadoop.fs.FsShell.run(FsShell.java:1817) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1960) > fs -put crash that depends on source file name > ---------------------------------------------- > > Key: HDFS-1768 > URL: https://issues.apache.org/jira/browse/HDFS-1768 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client, name-node > Affects Versions: 0.20.2 > Environment: Cloudera CDH3B4 in pseudo mode on a Linux > 2.6.32-28-generic #55-Ubuntu SMP x86_64 kernel, with Java HotSpot64-Bit > Server VM (build 19.1-b02, mixed mode) > Reporter: Lars Ailo Bongo > Priority: Minor > > I have a unit test that includes writing a file to HDFS using > copyFromLocalFile. Sometimes the function fails due to a checksum error. Once > the issue has occurred "hadoop -put <filename> <anywhere>" also fails as long > as the filename is the same as used in the unit test. The error is due to the > file content never being sent to the DataNode, hence the file is size zero. > The error is not due to the file content. The error does not depend on the > HDFS destination name. Restarting the NameNode and DataNode does not resolve > the issue. I have not been able to reproduce the error with a simple program. > I have also not tested the issue in distributed or standalone mode. > The only "fix" is to change the source filename. > Below is error and the NameNode log. There is no entry for this operation in > the DataNode log. > /home/larsab/troilkatt2/test-tmp/data>hadoop fs -put status-test.txt > status-test.txt3 > 11/03/18 16:59:54 INFO fs.FSInputChecker: Found checksum error: b[512, > 968]=3a646f6e650a323a7365636f6e6453746167653a73746172740a323a7365636f6e6453746167653a646f6e650a323a746869726453746167653a73746172740a323a746869726453746167653a646f6e650a323a74686553696e6b3a73746172740a323a74686553696e6b3a646f6e650a323a54726f696c6b6174743a646f6e650a333a54726f696c6b6174743a73746172740a333a746865536f757263653a73746172740a333a746865536f757263653a646f6e650a333a666972737453746167653a73746172740a333a666972737453746167653a646f6e650a333a7365636f6e6453746167653a73746172740a333a7365636f6e6453746167653a646f6e650a333a746869726453746167653a73746172740a333a746869726453746167653a646f6e650a333a74686553696e6b3a73746172740a333a74686553696e6b3a646f6e650a333a54726f696c6b6174743a646f6e650a343a54726f696c6b6174743a73746172740a343a746865536f757263653a73746172740a343a746865536f757263653a646f6e650a343a666972737453746167653a73746172740a343a666972737453746167653a646f6e650a343a7365636f6e6453746167653a7265636f7665720a > org.apache.hadoop.fs.ChecksumException: Checksum error: status-test.txt at 512 > at > org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277) > at > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241) > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189) > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158) > at java.io.DataInputStream.read(DataInputStream.java:83) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:49) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:224) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:170) > at > org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1283) > at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:134) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:1817) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:1960) > put: Checksum error: status-test.txt at 512 > NAMENODE > 2011-03-18 16:59:54,422 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: > 13 Total time for transactions(ms): 1Number of transactions batched in Syncs: > 0 Number of syncs: 7 SyncTimes(ms): 220 > 2011-03-18 16:59:54,444 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=larsab > ip=/127.0.0.1 cmd=create src=/user/larsab/status-test.txt3 > dst=null perm=larsab:supergroup:rw-r--r-- > 2011-03-18 16:59:54,469 INFO org.apache.hadoop.hdfs.StateChange: Removing > lease on file /user/larsab/status-test.txt3 from client DFSClient_-1004170418 > 2011-03-18 16:59:54,469 INFO org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.completeFile: file /user/larsab/status-test.txt3 is closed by > DFSClient_-1004170418 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira