[
https://issues.apache.org/jira/browse/HADOOP-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536863
]
Raghu Angadi commented on HADOOP-2080:
--------------------------------------
+1 to Owen's patch.
Thanks Richard. That was a good catch with the floats. findbugs catches similar
non-obvious math errors. May we should inform findbugs guys.
> ChecksumFileSystem checksum file size incorrect.
> ------------------------------------------------
>
> Key: HADOOP-2080
> URL: https://issues.apache.org/jira/browse/HADOOP-2080
> Project: Hadoop
> Issue Type: Bug
> Components: fs
> Affects Versions: 0.14.0, 0.14.1, 0.14.2
> Environment: Sun jdk1.6.0_02 running on Linux CentOS 5
> Reporter: Richard Lee
> Assignee: Owen O'Malley
> Priority: Blocker
> Fix For: 0.15.0
>
> Attachments: ChecksumFileSystem.java.patch, hadoop-2080.patch,
> TestInternalFilesystem.java
>
>
> Periodically, reduce tasks hang. When the log for the task is consulted, you
> see a stacktrace that looks like this:
> 2007-10-18 17:02:04,227 WARN org.apache.hadoop.mapred.ReduceTask:
> java.io.IOException: Insufficient space
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryOutputStream.write(InMemoryFileSystem.java:174)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:39)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:326)
> at
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:140)
> at
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:122)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:310)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
> at
> org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:253)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:685)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:637)
> The problem stems from a miscalculation of the checksum file created in the
> InMemoryFileSystem associated with the data being copied from a completed
> mapper task to the reducer task.
> The method used for calculating checksum file size is the following
> (ChecksumFileSystem:318):
> ((long)(Math.ceil((float)size/bytesPerSum)) + 1) * 4 +
> CHECKSUM_VERSION.length;
> The issue here is the cast to float. Floating point numbers have only 24
> bits of precision, thus will return short values on any size over 0x1000000.
> The fix is to replace this calculation with something that doesn't cast to
> float.
> (((size+1)/bytesPerSum) + 2) * 4 + CHECKSUM_VERSION.length
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.