[ 
https://issues.apache.org/jira/browse/PIG-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1474:
--------------------------------

    Fix Version/s:     (was: 0.9.0)

> Avoid serialization/deserialization costs for PigStorage data - Use custom 
> Tuple
> --------------------------------------------------------------------------------
>
>                 Key: PIG-1474
>                 URL: https://issues.apache.org/jira/browse/PIG-1474
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>
> Avoid sedes when possible for data loaded using PigStorage by implementing 
> approach #4 proposed in http://wiki.apache.org/pig/AvoidingSedes .
> The write() and readFields() functions of tuple returned by TupleFactory  is 
> used to serialize data between Map and Reduce. By using a tuple that knows 
> the serialization format of the loader, we avoid sedes at Map Recue boundary 
> and use the load functions serialized format between Map and Reduce . 
> To use a new custom tuple for this purpose, a custom TupleFactory that 
> returns tuples of this type has to be specified using the property 
> "pig.data.tuple.factory.name" .
> This approach will work only for a set of load functions in the query that 
> share same serialization format for map and bags. If this approach proves to 
> be very useful, it will build a case for more extensible approach.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to