[ 
https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HADOOP-1193:
----------------------------------

    Attachment: HADOOP-1193_1_20070517.patch

Here is a patch while I continue further testing... Hairong could you try to 
see if it works for you? Thanks!

Basically I went ahead and implemented a 'codec pool' to reuse the 
direct-buffer based codecs so as to not create too many of them... 

Results while trying to sort 1Million records via TestSequenceFile with RECORD 
compression:

                                     trunk           H-1193
Compressors:          1382                  3
Decompressors:      1520                 12
-----------------------------------------------------
Total:                            2902                 15

Results are even more dramatic for BLOCK compression (we need 4 codecs per 
Reader with BLOCK compression for key, keyLen, val & valLen) ... in fact I have 
gone ahead and bumped up the default direct buffer size for zlib to 64K from 1K 
which should lead to improved performance too, on the back of this patch.

Appreciate any review/feedback.

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>         Assigned To: Arun C Murthy
>         Attachments: HADOOP-1193_1_20070517.patch
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map 
> out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to