[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Joep Rottinghuis (JIRA) Fri, 05 Aug 2011 10:45:53 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080105#comment-13080105
 ]


Joep Rottinghuis commented on MAPREDUCE-2779:
---------------------------------------------

Patch looks good.
Affects 0.20-security-* branches as well.

FSDataOutputStream.getPos is not thread safe but then again 
DataOutPutStream.size does not seem to be thread safe either.
Even through the DataOutPutStream.write method is synchronized, 
FSDataOutputStrem.write is not synchronized.
This does not seem to be an issue in the current code path because 
createSplitFiles does not expose out.


> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.22.0, 0.23.0
>            Reporter: Ming Ma
>         Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big 
> job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit 
> signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Reply via email to