[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001987#comment-14001987 ]
Siddharth Seth commented on TEZ-776: ------------------------------------ In terms of pushing logic into plugins to reduce memory utilization - a plugin would know best about how events are being routed, they don't need to store additional state information about which tasks an event needs to be routed to. Broadcast for example, just needs to track all events - not apply any checks while deciding whether an event goes to a task, and just maintains indices. A generic solution, which is what has to be implemented anyway, will have to store information about which tasks an event goes to, and likely check this list each time it needs to decide whether an event needs to go to a task. It can obviously have some optimizations when an event is to be routed to all downstream tasks. Storing events in the plugins vs the Vertex itself - it's far easier to control Obsoletion, transient events which are required by TEZ-1094, when all relevant events are in a single place, rather than them being mixed - which would likely be the case when storing in the Vertex. > Reduce AM mem usage caused by storing TezEvents > ----------------------------------------------- > > Key: TEZ-776 > URL: https://issues.apache.org/jira/browse/TEZ-776 > Project: Apache Tez > Issue Type: Sub-task > Reporter: Siddharth Seth > Assignee: Siddharth Seth > > This is open ended at the moment. > A fair chunk of the AM heap is taken up by TezEvents (specifically > DataMovementEvents - 64 bytes per event). > Depending on the connection pattern - this puts limits on the number of tasks > that can be processed. -- This message was sent by Atlassian JIRA (v6.2#6252)