[
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317218#comment-14317218
]
Siddharth Seth commented on TEZ-776:
------------------------------------
Was having an offline discussion with [~gopalv] and [~acmurthy]
Another option (Option 5), would be to just flatten the storage structure
within the AM. e.g. TezEvents for shuffle are only replicated to store
targetIndex inside of a modified DataMovementEvent. The original TezEvent can
be stored as is - with the targetIndex stored outside, which brings down the
reference count per Event from 16 to ~2-3. Likely the simplest option which
should improve memory usage by ~5x. The tasks / final layer inside AM can take
care of setting up the DME properly.
Within DMEs itself, introducing host/port can help save on a lot of repetition
in hostnames - and effectively the ~100 byte userpayload.
> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
> Key: TEZ-776
> URL: https://issues.apache.org/jira/browse/TEZ-776
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Attachments: events-problem-solutions.txt
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks
> that can be processed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)