[ 
https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714282#action_12714282
 ] 

Jothi Padmanabhan commented on HADOOP-5539:
-------------------------------------------

bq. Why no unit test? If you tested this manually, what steps did you perform?

It is pretty difficult to write a unit test for this patch as this patch just 
enables compression during intermediate merges. The files that are created 
during the intermediate merges are consumed soon after they are created and the 
final merged file was compressed even without this patch. I did the same test 
as Billy had done -- add print statements in the framework code (Merger.java) 
to verify if compression was turned on during intermediate merges.

bq. Why no javadoc for new methods?

The newly added methods are in Merger, which is a mapred package private class

bq. no commit for 0.19 branch?
Billy, from this comment 
https://issues.apache.org/jira/browse/HADOOP-5539?focusedCommentId=12708570&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12708570,
 we thought you needed this only for 0.20. If you need it for 0.19 branch as 
well, I can generate a patch for that too.

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Assignee: Jothi Padmanabhan
>            Priority: Blocker
>             Fix For: 0.20.1
>
>         Attachments: 5539.patch, hadoop-5539-branch20.patch, 
> hadoop-5539-v1.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= 
> io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class 
> line 432)
> passes the codec but I added some logging and its always null map output 
> compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the 
> data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for 
> the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 
> intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 
> used codec: null
> {code}
> I added 
> {code}
>           // added my me
>          if (codec != null){
>            LOG.info("intermediate." + passNo + " used codec: " + 
> codec.toString());
>          } else {
>            LOG.info("intermediate." + passNo + " used codec: Null");
>          }
>          // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the 
> disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can 
> not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk 
> merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as 
> it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to