File corruption when reading with fuse-dfs
------------------------------------------

                 Key: HADOOP-4298
                 URL: https://issues.apache.org/jira/browse/HADOOP-4298
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/fuse-dfs
    Affects Versions: 0.18.1
         Environment: CentOs 4.6 final; kernel 2.6.9-67.ELsmp; FUSE 2.7.4; 
hadoop 0.18.1; 64-bit

I hand-altered the fuse-dfs makefile to use 64-bit instead of the hardcoded 
-m32.
            Reporter: Brian Bockelman
            Priority: Critical
             Fix For: 0.18.1



I pulled a 5GB data file into Hadoop using the following command:
hadoop fs -put /scratch/886B9B3D-6A85-DD11-A9AB-000423D6CA6E.root 
/user/brian/testfile
I have HDFS mounted in /mnt/hadoop using fuse-dfs.

However, when I try to md5sum the file in place (md5sum /mnt/hadoop) or copy 
the file back to local disk using "cp" then md5sum it, the checksum is 
incorrect.

When I pull the file using normal hadoop means (hadoop fs -get 
/user/brian/testfile /scratch), the md5sum is correct.

When I repeat the test with a smaller file (512MB, on the theory that there is 
a problem with some 2GB limit somewhere), the problem remains.
When I repeat the test, the md5sum is consistently wrong - i.e., some part of 
the corruption is deterministic, and not the apparent fault of a bad disk.

CentOs 4.6 is, unfortunately, not the apparent culprit.  When checking on 
CentOs 5.x, I could recreate the corruption issue.  The second node was also a 
64-bit compile and CentOs 5.2 (`uname -r` returns 2.6.18-92.1.10.el5).

Thanks for looking into this,
Brian

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to