[
https://issues.apache.org/jira/browse/TEZ-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076397#comment-14076397
]
Siddharth Seth commented on TEZ-1288:
-------------------------------------
+1. Looks good. Thanks [~rajesh.balamohan]. This is an awesome change - the
savings in the data being sent over the wire for BytesWritable.
> Create FastTezSerialization as an optional feature
> --------------------------------------------------
>
> Key: TEZ-1288
> URL: https://issues.apache.org/jira/browse/TEZ-1288
> Project: Apache Tez
> Issue Type: Improvement
> Affects Versions: 0.5.0
> Reporter: Gopal V
> Assignee: Rajesh Balamohan
> Attachments: TEZ-1288.1.patch, TEZ-1288.2.patch, TEZ-1288.3.patch,
> TEZ-1288.4.patch
>
>
> Tez inherits the writable framework from map-reduce.
> This is very flexible, but not particularly memory efficient for the small
> data types.
> When deserializing, each value and key has to be allocated afresh for each
> small chunk of data (new IntWritable instead of .set()).
> The bytes writable serialization operation always has to write a 4 byte
> prefix for all values and keys, because of requirements around streamed
> .readFields() instead of a customer setter/getter impl.
> Implement a faster serialization mechanism for the inner loop of sort, spill,
> merge, which doesn't trigger the GC and avoids adding simplistic overheads to
> the IFile format.
--
This message was sent by Atlassian JIRA
(v6.2#6252)