[
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337266#comment-14337266
]
Siddharth Seth commented on TEZ-776:
------------------------------------
bq. From the simulation, the CPU overhead for routing 10000 composite events
per heartbeat is 2ms. With our typical 500 events at a time it would be about
100us - which seems reasonable.
Beyond the time taken, the CPU utilization stats matter. There will be other
tasks / AMs contenting for CPU on the same node. The number of calls can be
really large - and we'll have #heartbeathandler threads performing this in
parallel. The time window is currently limited by tasks heartbeating in and
asking for events, it can get worse when we start pushing events to tasks as
soon as they're ready.
bq. The details on 4 mentioned above sounds very similar to what the patch is
doing (if 4 is not storing the events in TaskImpl).
I'm not sure where TaskImpl comes in with Option 4. The patch does add APIs to
the edge; the main difference from Option 4 is letting the EdgePlugin handle
storage, and optimize CPU. Routing the event each and every time is not
required.
bq. Obsoletion, in theory, suffers from a race condition between when a task
pulls events and when some other task reports them as failed. Any mitigation in
the AM is probably superfluous because timings are too small for when it will
become a problem in the routing. The problem, from what I see, it in the
windowing of event fetches from the inputs. Typically the DMevent and its
InputFailed counterpart will be available to the input but across windows of
fetches. So handling that in the input might be a possible practical mitigation.
These are handled in Inputs, but that isn't an efficient mechanism to handle
them. Consolidating them as early possible and minimizing the potential for a
task to attempt fetching an event is much better. With 100K sources, as an
example, it's very easy for the FailedEvent to show up much later than the
event that is being obsoleted.
bq. InputFailedEvents are well-known to the framework and obsoletion can be
done by matching up routing indices by the framework itself but its overhead
(wherever its done)
InputFailedEvents need to be matched against the generated events. I'm not sure
the framework is setup very well to do this - not without additional routing
and lookups. Edges can consolidate these in a smart manner, with far more
information available. Also efficiently.
bq. If TransientDataMovement events are arriving and sitting around without
consumers in the AM then they are probably not correctly done.
This is another example of an Edge consolidating events for more efficient
usage.
bq. Consolidating is essentially routing overhead. In general, IMO, we should
try to keep event routing plugins as simple and quick as possible so that we
can route events as soon as possible.
The plugin execution can continue to be quick even if it manages events. There
is some complexity to management of events and consolidation - but the benefits
in overall efficiency and execution speed (not doing a MXN route) is a bigger
win IMO.
Leaving the comments on the patch for later discussion.
> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
> Key: TEZ-776
> URL: https://issues.apache.org/jira/browse/TEZ-776
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Attachments: TEZ-776.ondemand.1.patch, TEZ-776.ondemand.patch,
> events-problem-solutions.txt
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks
> that can be processed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)