[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002342#comment-14002342
 ] 

Bikas Saha commented on TEZ-776:
--------------------------------

bq. they don't need to store additional state information about which tasks an 
event needs to be routed to
We might be talking about different things here. Event routing vs event 
storage. Event routing is anyways abstracted to the plugin and the broadcast 
plugin does exactly that by not storing any state etc.
And we are mainly concerned with Data events that need to be stored for 
recovery purposes. The crucial problem is the fact that the scatter gather 
pattern can result in an explosion of events and references. That causes the 
high memory usage in the AM. This is orthogonal to routing and in fact the 
scatter gather plugin does not have any state/overhead. The explosion happens 
because we need to create separate objects with distinct input index 
information so that the LogicalInput on the task can handle them correctly. And 
the tasks spread those events around because of the way event fetching is done 
and to handle retries of getEvents() RPC.

bq. Storing events in the plugins vs the Vertex itself - it's far easier to 
control Obsoletion
Don't quite get this. If the events are in the same place (which today they 
arent) then the same rules can be applied whether its Vertex and plugin. The 
rules depend on the type of the event. e.g. if all the events are in the vertex 
then it can obsolete/discard the data event when it fails the task that 
produced the event. The vertex already has access to all this information and 
where the fundamental state transitions take place. So that would be the 
quickest and least overhead place to do the book-keeping.

In any case, I am kind of hand waving here. Will wait for a clear 
design/solution description before making more comments so that the discussion 
is concrete.

> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to