[
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326315#comment-14326315
]
Siddharth Seth commented on TEZ-776:
------------------------------------
bq. From what I understand 3 is figuring out a compact way to encode the
routing table. So I am not sure how it is going to reduce the overhead of
storing the exploded events (reference overhead) or the events themselves
(after exploding them). Seemed like it depends on composite event routing and
ondemand routing. So its like a mix of subsets of 1 and 2 with the encoding
used to reduce the routing cpu overhead?
Yes, 3 is trying to setup a more efficient storage model for events, at the
expense of some additional CPU to lookup from this storage instead of from a
simple list. There'll still be a fairly large memory overhead given that we're
trying to model a very generic routing table.
For both approach 1 and 3, having the Edge try to infer information about the
event routing - e.g. To all tasks / To all indices could be used for improving
the efficiency.
bq. Moving routing into vertex plugins instead of edge plugins is shifting
responsibilities architecturally. Not sure how it solves the event overhead or
reference overhead. Making a hack in the shuffle plugin is probably not a good
idea. Users can be extending it or using their own for shuffles when they want
their own version of auto-reduce.
If this is referencing option 4, the VertexPlugin isn't really involved. Some
more details.
The EdgePlugin becomes responsible for storing the routing table, as well as
serving out requests of the kind getNextNEvents(int targetTaskIndex, int
numEvents). The edge plugin essentially gets the option of storing the routing
table - which it can do in a very efficient manner, especially for
'predictable' routing tables.
In case of ScatterGather for instance - the M CompositeDataMovementEvents need
to be stored, N integers for event indices which have been consumed by the
specific N consumer tasks. A lookup by task J involves taking the Jth index
(K)- looking up 'numEvents' elements from the Kth index of the
CompositeDataMovementEvent list.
The base EdgeManager can implement generic storage and routing for all
non-standard EdgeManagerPlugins - which users can choose to override for
efficiency. It does make EdgeManagerPlugins more complicated to write if users
go down this route - but will be the most memory / CPU efficient since
EMPlugins know exactly what they're dealing with.
TOL, a variant of this approach, this could be achieved by changing the
construct returned by the routeEvent method call - and make that an interface,
which provides lookup capabilities based on taskIndex. (Related to TEZ-1927)
EventRoutingInfo routeCompositeEvent(CompositeEvent compositeEvent);
M instances of EventRoutingInfo stored.
Runtime lookup of EventRoutingInfo.getEventsForTask(int taskIndex) - which
generates the actual events.
Shuffle can implement it's own variant of EventRoutingInfo which is memory
efficient. Has a CPU overhead though - possibly the same as dynamic routing?
bq. TezEvents are wrappers around data movement events. They are replicated
only for composite dm events since they are exploded. tezevent store identical
src/dest metainfo for the exploded events. Not sure about the math of reducing
the ref count from 16 to 2-3. Not replicating them and sharing them across dm
events may cause race conditions during rpc serialization.
This is more like 8 down to 2 - since a entities like srcMeta, destMeta and the
ByteBuffer are already shared. That should still give a good gain - however,
the MXN reference matrix remains - which in itself can be huge.
> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
> Key: TEZ-776
> URL: https://issues.apache.org/jira/browse/TEZ-776
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Attachments: events-problem-solutions.txt
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks
> that can be processed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)