[ 
https://issues.apache.org/jira/browse/TEZ-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3732:
---------------------------------
    Attachment: TEZ-3732.2.patch

Added another change for InMemoryMapOutput. Since only In-Memory Merged 
MapOutputs use the BoundedByteArrayOutputStream and then only once, created 
that OutputStream on demand saving 32 bytes per InMemoryMapOutput.

Unfortunately, will have to save the unordered case for another JIRA.

> Reduce Object size of InputAttemptIdentifier and MapOutput for large jobs
> -------------------------------------------------------------------------
>
>                 Key: TEZ-3732
>                 URL: https://issues.apache.org/jira/browse/TEZ-3732
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: TEZ-3732.1.patch, TEZ-3732.2.patch
>
>
> Objects in 64bit java are 12bytes + member size aligned to 8 bytes
> InputAttemptIdentifier -> 33Bytes gets aligned up to 40 bytes
> This class is just one byte over the 32 byte alignment. Reducing object size 
> by one byte can save 8 bytes per object.
> This is ~8MB savings for 1,000,000 inputs and ~80 MB savings for tasks with 
> 10,000,000 inputs to fetch (Yes this is a real job)
> MapOutput -> 45 bytes gets aligned to 48 bytes
> This class can be sub-classed to avoid all sub-classes paying the object size 
> cost for the other sub-classes
> Wait InMemory and DiskDirect -> 32 bytes
> Disk -> 40 bytes
> Total savings is harder to account for but more than the above case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to