[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317218#comment-14317218
 ] 

Siddharth Seth commented on TEZ-776:
------------------------------------

Was having an offline discussion with [~gopalv] and [~acmurthy]
Another option (Option 5), would be to just flatten the storage structure 
within the AM. e.g. TezEvents for shuffle are only replicated to store 
targetIndex inside of a modified DataMovementEvent. The original TezEvent can 
be stored as is - with the targetIndex stored outside, which brings down the 
reference count per Event from 16 to ~2-3. Likely the simplest option which 
should improve memory usage by ~5x. The tasks / final layer inside AM can take 
care of setting up the DME properly.

Within DMEs itself, introducing host/port can help save on a lot of repetition 
in hostnames - and effectively the ~100 byte userpayload.

> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>         Attachments: events-problem-solutions.txt
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to