[ 
https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13606445#comment-13606445
 ] 

Koji Noguchi commented on PIG-3251:
-----------------------------------

bq.  Let me know if you find any problem in your testing.

Thanks Daniel.  My initial test went well on 0.23 cluster.  It was as fast as 
the original and requiring less memory.  
However, the patched pig is super slow on 1.0.2 cluster.

Reason is, I'm using the Text directly as the replacement of 
ByteArrayOutputStream.  Without HADOOP-6109 which was committed in 0.21, Text 
grows linearly whereas ByteArrayOutputStream grows exponentially requiring a 
lot more copies for the former.
                
> Bzip2TextInputFormat requires double the memory of maximum record size
> ----------------------------------------------------------------------
>
>                 Key: PIG-3251
>                 URL: https://issues.apache.org/jira/browse/PIG-3251
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>         Attachments: pig-3251-trunk-v01.patch
>
>
> While looking at user's OOM heap dump, noticed that pig's 
> Bzip2TextInputFormat consumes memory at both
> Bzip2TextInputFormat.buffer (ByteArrayOutputStream) 
> and actual Text that is returned as line.
> For example, when having one record with 160MBytes, buffer was 268MBytes and 
> Text was 160MBytes.  
> We can probably eliminate one of them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to