[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337266#comment-14337266
 ] 

Siddharth Seth commented on TEZ-776:
------------------------------------

bq. From the simulation, the CPU overhead for routing 10000 composite events 
per heartbeat is 2ms. With our typical 500 events at a time it would be about 
100us - which seems reasonable.
Beyond the time taken, the CPU utilization stats matter. There will be other 
tasks / AMs contenting for CPU on the same node. The number of calls can be 
really large - and we'll have #heartbeathandler threads performing this in 
parallel. The time window is currently limited by tasks heartbeating in and 
asking for events, it can get worse when we start pushing events to tasks as 
soon as they're ready.

bq. The details on 4 mentioned above sounds very similar to what the patch is 
doing (if 4 is not storing the events in TaskImpl).
I'm not sure where TaskImpl comes in with Option 4. The patch does add APIs to 
the edge; the main difference from Option 4 is letting the EdgePlugin handle 
storage, and optimize CPU. Routing the event each and every time is not 
required.

bq. Obsoletion, in theory, suffers from a race condition between when a task 
pulls events and when some other task reports them as failed. Any mitigation in 
the AM is probably superfluous because timings are too small for when it will 
become a problem in the routing. The problem, from what I see, it in the 
windowing of event fetches from the inputs. Typically the DMevent and its 
InputFailed counterpart will be available to the input but across windows of 
fetches. So handling that in the input might be a possible practical mitigation.
These are handled in Inputs, but that isn't an efficient mechanism to handle 
them. Consolidating them as early possible and minimizing the potential for a 
task to attempt fetching an event is much better. With 100K sources, as an 
example, it's very easy for the FailedEvent to show up much later than the 
event that is being obsoleted.

bq. InputFailedEvents are well-known to the framework and obsoletion can be 
done by matching up routing indices by the framework itself but its overhead 
(wherever its done)
InputFailedEvents need to be matched against the generated events. I'm not sure 
the framework is setup very well to do this - not without additional routing 
and lookups. Edges can consolidate these in a smart manner, with far more 
information available. Also efficiently.

bq.  If TransientDataMovement events are arriving and sitting around without 
consumers in the AM then they are probably not correctly done. 
This is another example of an Edge consolidating events for more efficient 
usage.

bq.  Consolidating is essentially routing overhead. In general, IMO, we should 
try to keep event routing plugins as simple and quick as possible so that we 
can route events as soon as possible.
The plugin execution can continue to be quick even if it manages events. There 
is some complexity to management of events and consolidation - but the benefits 
in overall efficiency and execution speed (not doing a MXN route) is a bigger 
win IMO.

Leaving the comments on the patch for later discussion.

> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>         Attachments: TEZ-776.ondemand.1.patch, TEZ-776.ondemand.patch, 
> events-problem-solutions.txt
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to