[ 
https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jothi Padmanabhan updated HADOOP-5539:
--------------------------------------

    Attachment: hadoop-5539-branch20.patch

Patch for the 20 branch

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch, hadoop-5539-branch20.patch, 
> hadoop-5539-v1.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= 
> io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class 
> line 432)
> passes the codec but I added some logging and its always null map output 
> compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the 
> data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for 
> the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 
> intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 
> used codec: null
> {code}
> I added 
> {code}
>           // added my me
>          if (codec != null){
>            LOG.info("intermediate." + passNo + " used codec: " + 
> codec.toString());
>          } else {
>            LOG.info("intermediate." + passNo + " used codec: Null");
>          }
>          // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the 
> disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can 
> not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk 
> merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as 
> it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to