[jira] Commented: (HADOOP-4298) File corruption when reading with fuse-dfs

Brian Bockelman (JIRA) Tue, 30 Sep 2008 13:56:48 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635853#action_12635853
 ]


Brian Bockelman commented on HADOOP-4298:
-----------------------------------------

Hey Pete - 

Regarding the comment on "29/Sep/08 11:10 PM": Yes, I misspoke.  I had adjusted 
-m64 in the libhdfs makefile.

Regarding the later comment, "30/Sep/08 01:09 PM":  As you mention, FUSE 
appears to be unhappy with this behavior - I always get the asked-for number of 
bytes returned.

Besides, in the unix definition of 'read' 
(http://www.opengroup.org/onlinepubs/000095399/functions/read.html), it states, 
regarding the return value:

"""
This number shall never be greater than nbyte. The value returned may be less 
than nbyte if the number of bytes left in the file is less than nbyte, if the 
read() request was interrupted by a signal, or if the file is a pipe or FIFO or 
special file and has fewer than nbyte bytes immediately available for reading. 
For example, a read() from a file associated with a terminal may return one 
typed line of data.
"""

To me, that means if there are more bytes in the file, you should give exactly 
nbytes; I guess I can see that if you have less than nbytes in the buffer, you 
could claim it meets the requirement "fewer than nbyte bytes immediately 
available for reading".

I guess you still end up being stuck with the FUSE problem.

Regarding exceptionally large reads (> 10MB): it might be a problem, but I 
can't trigger it locally.

Finally, I think when my application failed last, I hadn't remounted the FS 
with the new code.  I tried it out this morning, and was pleased to see things 
work all the way through, no segfaults.

Thanks for the help.  This goes far in "selling" the idea of a new distributed 
file system to our sysadmins.

Brian

> File corruption when reading with fuse-dfs
> ------------------------------------------
>
>                 Key: HADOOP-4298
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4298
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fuse-dfs
>    Affects Versions: 0.18.1
>         Environment: CentOs 4.6 final; kernel 2.6.9-67.ELsmp; FUSE 2.7.4; 
> hadoop 0.18.1; 64-bit
> I hand-altered the fuse-dfs makefile to use 64-bit instead of the hardcoded 
> -m32.
>            Reporter: Brian Bockelman
>            Priority: Critical
>             Fix For: 0.18.2
>
>
> I pulled a 5GB data file into Hadoop using the following command:
> hadoop fs -put /scratch/886B9B3D-6A85-DD11-A9AB-000423D6CA6E.root 
> /user/brian/testfile
> I have HDFS mounted in /mnt/hadoop using fuse-dfs.
> However, when I try to md5sum the file in place (md5sum /mnt/hadoop) or copy 
> the file back to local disk using "cp" then md5sum it, the checksum is 
> incorrect.
> When I pull the file using normal hadoop means (hadoop fs -get 
> /user/brian/testfile /scratch), the md5sum is correct.
> When I repeat the test with a smaller file (512MB, on the theory that there 
> is a problem with some 2GB limit somewhere), the problem remains.
> When I repeat the test, the md5sum is consistently wrong - i.e., some part of 
> the corruption is deterministic, and not the apparent fault of a bad disk.
> CentOs 4.6 is, unfortunately, not the apparent culprit.  When checking on 
> CentOs 5.x, I could recreate the corruption issue.  The second node was also 
> a 64-bit compile and CentOs 5.2 (`uname -r` returns 2.6.18-92.1.10.el5).
> Thanks for looking into this,
> Brian

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4298) File corruption when reading with fuse-dfs

Reply via email to