[
https://issues.apache.org/jira/browse/TEZ-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Eagles updated TEZ-3202:
---------------------------------
Attachment: TEZ-3202.2.patch
Further, reduced the memory needed by taking [~sseth]'s suggestion and
splitting TezMerger.Segment into in memory segment and on disk segment to
reduce the cost of in memory segments another 36 bytes or so. For 800000
segments this is around 25 MB heap.
> Reduce the memory need for jobs with high number of segments
> ------------------------------------------------------------
>
> Key: TEZ-3202
> URL: https://issues.apache.org/jira/browse/TEZ-3202
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
> Attachments: TEZ-3202.1.patch, TEZ-3202.2.patch
>
>
> Segment has a 'key' member that holds accounting information to the reader's
> current key buffer, position, and length. There is a 384 byte overhead per
> segment since the account is done with the DataInputBuffer class which
> derives from DataInputStream which has underlying byte[80] and char[80] among
> significant pieces. This jira aims to reduce the overhead per segment
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)