[ https://issues.apache.org/jira/browse/TEZ-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rajesh Balamohan updated TEZ-1288: ---------------------------------- Attachment: TEZ-1288.1.patch [~gopalv], [~sseth] Can you please review https://reviews.apache.org/r/23724/ > Create FastTezSerialization as an optional feature > -------------------------------------------------- > > Key: TEZ-1288 > URL: https://issues.apache.org/jira/browse/TEZ-1288 > Project: Apache Tez > Issue Type: Improvement > Affects Versions: 0.5.0 > Reporter: Gopal V > Assignee: Rajesh Balamohan > Attachments: TEZ-1288.1.patch > > > Tez inherits the writable framework from map-reduce. > This is very flexible, but not particularly memory efficient for the small > data types. > When deserializing, each value and key has to be allocated afresh for each > small chunk of data (new IntWritable instead of .set()). > The bytes writable serialization operation always has to write a 4 byte > prefix for all values and keys, because of requirements around streamed > .readFields() instead of a customer setter/getter impl. > Implement a faster serialization mechanism for the inner loop of sort, spill, > merge, which doesn't trigger the GC and avoids adding simplistic overheads to > the IFile format. -- This message was sent by Atlassian JIRA (v6.2#6252)