[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002342#comment-14002342 ]
Bikas Saha commented on TEZ-776: -------------------------------- bq. they don't need to store additional state information about which tasks an event needs to be routed to We might be talking about different things here. Event routing vs event storage. Event routing is anyways abstracted to the plugin and the broadcast plugin does exactly that by not storing any state etc. And we are mainly concerned with Data events that need to be stored for recovery purposes. The crucial problem is the fact that the scatter gather pattern can result in an explosion of events and references. That causes the high memory usage in the AM. This is orthogonal to routing and in fact the scatter gather plugin does not have any state/overhead. The explosion happens because we need to create separate objects with distinct input index information so that the LogicalInput on the task can handle them correctly. And the tasks spread those events around because of the way event fetching is done and to handle retries of getEvents() RPC. bq. Storing events in the plugins vs the Vertex itself - it's far easier to control Obsoletion Don't quite get this. If the events are in the same place (which today they arent) then the same rules can be applied whether its Vertex and plugin. The rules depend on the type of the event. e.g. if all the events are in the vertex then it can obsolete/discard the data event when it fails the task that produced the event. The vertex already has access to all this information and where the fundamental state transitions take place. So that would be the quickest and least overhead place to do the book-keeping. In any case, I am kind of hand waving here. Will wait for a clear design/solution description before making more comments so that the discussion is concrete. > Reduce AM mem usage caused by storing TezEvents > ----------------------------------------------- > > Key: TEZ-776 > URL: https://issues.apache.org/jira/browse/TEZ-776 > Project: Apache Tez > Issue Type: Sub-task > Reporter: Siddharth Seth > Assignee: Siddharth Seth > > This is open ended at the moment. > A fair chunk of the AM heap is taken up by TezEvents (specifically > DataMovementEvents - 64 bytes per event). > Depending on the connection pattern - this puts limits on the number of tasks > that can be processed. -- This message was sent by Atlassian JIRA (v6.2#6252)