[
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344654#comment-14344654
]
Bikas Saha commented on TEZ-776:
--------------------------------
The edge manager implementations are precisely that. Impls of interfaces so
that routing logic can be captured in code rather than as routing tables in
memory. And thats what all impls to date do. They have logic in code to
generate the routing data on the fly instead of encoding it in some memory
efficient manner. We are probably unnecessarily going around in circles about
this point of efficiently storing routing tables by smart edge manager plugins.
Thats already the case.
Lets look at option 4 and on demand routing (ODR in the attached patch).
Neither stores the exploded data movement events. In option 4. edge plugins are
made responsible for event storage and implement an API (say. getNextEvents())
to get exploded events for a task while maintaining an index for that tasks
last read index into the event array. In ODR, the vertex stores the events and
gets routing data from the edge plugins when a task asks for events. In both
cases, the exploded events are not stored (to reduce memory pressure) and so
they must be routed to the requesting task. Each of the MxN DM events have
unique source+target indices and thus there is no caching/reuse possible. The
CPU needed would be similar in both cases because the inner loop (that consumes
CPU) consists of creating these events and setting the routing info on them.
You can see that from the yourkit profiles. The simulation in the patch
essentially captures the essence of 4 since there is only 1 edge and hence all
events are iterated through and exploded by the edge plugin. So the difference
essentially lies in whether the burden of event storage etc. lies in the
framework or if pushed to the user. ODR is keeping it within the framework
because that is essentially the frameworks responsibility.
Another important correctness issue came up during the review of the tez paper
submitted to sigmod. User payloads capture user metadata (and potentially data)
and thus from a security correctness point of view its essential to limit who
has access to the payloads. Inputs and outputs communicate via the payloads and
need access for correctness. But nothing else should. Edge plugins are user
code. Some written by Tez developers, some by others. Using third party plugins
to route events should not mean that they have access to the user payloads of
those events. That would be a potential security hole.
InputFailedEvents need routing info from the edge plugin mainly for the case
when DM events have been sent to the consumers and they need to be told which
one of them may now be bad. However, plugins are not needed to not send dm
events to future consumers after the receipt of an input failed event. An input
failed event essentially captures the lost task-attempt. The framework can skip
dm events generated by failed task-attempts (for future consumers) after
receiving failed task-attempt info via an input failed event. We can make that
change independently today. Again, its best to make the framework support this
for all instead of making it a burden for every edge plugin implementation.
Routing events as soon as they arrive would be covered by on-demand-routing
since essentially it is on demand routing of events upon arrival. So the APIs
in the edge plugins support that use case. How the AM would end up actually
sending the event onward based on the routing info would depend on the
implementation of push based routing and is probably orthogonal to being able
to capture the routing info on a fine grained basis.
> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
> Key: TEZ-776
> URL: https://issues.apache.org/jira/browse/TEZ-776
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Attachments: TEZ-776.ondemand.1.patch, TEZ-776.ondemand.patch,
> events-problem-solutions.txt
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks
> that can be processed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)