[ https://issues.apache.org/jira/browse/HADOOP-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635637#action_12635637 ]
Brian Bockelman commented on HADOOP-4298: ----------------------------------------- Hey Owen, all, In fuse_dfs.c, I replaced this line in function dfs_read: if (fh->sizeBuffer == 0 || offset < fh->startOffset || offset > (fh->startOffset + fh->sizeBuffer) ) with the following: if (fh->sizeBuffer == 0 || offset < fh->startOffset || offset >= (fh->startOffset + fh->sizeBuffer) || (offset+size) >= (fh->startOffset + fh->sizeBuffer) ) This covers the bug I mentioned below. I can now md5sum files successfully. However, my application still complains of data corruption on reads (although it does make it further through the file!); the application, unlike md5sum, has a very random read pattern. One possibility is that it is doing a huge read which would break through the buffer, or another undiscovered bug. When I figure things out, I'll turn the above fix into a proper patch (although you are welcome to do it for me if you have time). However, I would prefer if the expert or original author took a peak at this code. > File corruption when reading with fuse-dfs > ------------------------------------------ > > Key: HADOOP-4298 > URL: https://issues.apache.org/jira/browse/HADOOP-4298 > Project: Hadoop Core > Issue Type: Bug > Components: contrib/fuse-dfs > Affects Versions: 0.18.1 > Environment: CentOs 4.6 final; kernel 2.6.9-67.ELsmp; FUSE 2.7.4; > hadoop 0.18.1; 64-bit > I hand-altered the fuse-dfs makefile to use 64-bit instead of the hardcoded > -m32. > Reporter: Brian Bockelman > Priority: Critical > Fix For: 0.18.2 > > > I pulled a 5GB data file into Hadoop using the following command: > hadoop fs -put /scratch/886B9B3D-6A85-DD11-A9AB-000423D6CA6E.root > /user/brian/testfile > I have HDFS mounted in /mnt/hadoop using fuse-dfs. > However, when I try to md5sum the file in place (md5sum /mnt/hadoop) or copy > the file back to local disk using "cp" then md5sum it, the checksum is > incorrect. > When I pull the file using normal hadoop means (hadoop fs -get > /user/brian/testfile /scratch), the md5sum is correct. > When I repeat the test with a smaller file (512MB, on the theory that there > is a problem with some 2GB limit somewhere), the problem remains. > When I repeat the test, the md5sum is consistently wrong - i.e., some part of > the corruption is deterministic, and not the apparent fault of a bad disk. > CentOs 4.6 is, unfortunately, not the apparent culprit. When checking on > CentOs 5.x, I could recreate the corruption issue. The second node was also > a 64-bit compile and CentOs 5.2 (`uname -r` returns 2.6.18-92.1.10.el5). > Thanks for looking into this, > Brian -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.