[
https://issues.apache.org/jira/browse/PIG-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olga Natkovich updated PIG-1875:
--------------------------------
Fix Version/s: 0.10
> Keep tuples serialized to limit spilling and speed it when it happens
> ---------------------------------------------------------------------
>
> Key: PIG-1875
> URL: https://issues.apache.org/jira/browse/PIG-1875
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Reporter: Alan Gates
> Priority: Minor
> Fix For: 0.10
>
> Attachments: mrtuple.patch
>
>
> Currently Pig reads records off of the reduce iterator and immediately
> deserializes them into Java objects. This takes up much more memory than
> serialized versions, thus Pig spills sooner then if it stored them in
> serialized form. Also, if it does have to spill, it has to serialize them
> again, and then again deserialize them after reading from the spill file.
> We should explore storing them in memory serialized when they are read off of
> the reduce iterator.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira