[ 
https://issues.apache.org/jira/browse/HADOOP-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536976
 ] 

Hudson commented on HADOOP-2080:
--------------------------------

Integrated in Hadoop-Nightly #280 (See 
[http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/280/])

> ChecksumFileSystem checksum file size incorrect.
> ------------------------------------------------
>
>                 Key: HADOOP-2080
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2080
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0, 0.14.1, 0.14.2
>         Environment: Sun jdk1.6.0_02 running on Linux CentOS 5
>            Reporter: Richard Lee
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: ChecksumFileSystem.java.patch, hadoop-2080.patch, 
> TestInternalFilesystem.java
>
>
> Periodically, reduce tasks hang. When the log for the task is consulted, you 
> see a stacktrace that looks like this:
> 2007-10-18 17:02:04,227 WARN org.apache.hadoop.mapred.ReduceTask: 
> java.io.IOException: Insufficient space
>       at 
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryOutputStream.write(InMemoryFileSystem.java:174)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:39)
>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>       at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:326)
>       at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:140)
>       at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:122)
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:310)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
>       at 
> org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:253)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:685)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:637)
> The problem stems from a miscalculation of the checksum file created in the 
> InMemoryFileSystem associated with the data being copied from a completed 
> mapper task to the reducer task.
> The method used for calculating checksum file size is the following 
> (ChecksumFileSystem:318):
> ((long)(Math.ceil((float)size/bytesPerSum)) + 1) * 4 + 
> CHECKSUM_VERSION.length;
> The issue here is the cast to float.  Floating point numbers have only 24 
> bits of precision, thus will return short values on any size over 0x1000000.  
> The fix is to replace this calculation with something that doesn't cast to 
> float.
> (((size+1)/bytesPerSum) + 2) * 4 + CHECKSUM_VERSION.length

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to