[
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349931#comment-14349931
]
Bikas Saha commented on TEZ-776:
--------------------------------
Sorry for the delayed response.
MxN for broadcast is a result of not having visibility to the event payload.
Without that data, its impossible to avoid. If that data is visible then
relevant events can be cached when they make sense. Broadcast is an example
where caching is helpful.
I doubt that CPU overhead for iterating over 1-1 events is going to be
relevant. Routing over 1-1 may not simply be a single lookup because attempts
may fail and get retried and events need to be iterated over to get to the new
versions. Unless of course some dictionary is being created to lookup all
events generated by a certain tasks attempts.
Past events that havent yet been routed can be ignored if they are from an
attempt that has been invalidated via in inputfailed event. This can be done by
the vertex since it has both the dm events and the input-failed events which
can be matched by task attempt id. There is no need to burden every edge plugin
writer with this.
Push based routing needs all those questions answered any more that are
probably orthogonal here.
No. Its test cpu usage and most of it comes from the central dispatcher under
load in the simulation. No periodic spikes were observed in the running jobs.
If there is any other way to measure this then I am open to suggestions.
The cpu numbers reinforce that the the cpu utilization is related to the number
of events in the inner loop. ie. if the cpu used in routing is a significant
fraction of the total cpu in the first place. Its the same code. Using old and
new routing based on the new config.
> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
> Key: TEZ-776
> URL: https://issues.apache.org/jira/browse/TEZ-776
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Attachments: TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch,
> TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.patch,
> events-problem-solutions.txt
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks
> that can be processed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)