[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326315#comment-14326315
 ] 

Siddharth Seth commented on TEZ-776:
------------------------------------

bq. From what I understand 3 is figuring out a compact way to encode the 
routing table. So I am not sure how it is going to reduce the overhead of 
storing the exploded events (reference overhead) or the events themselves 
(after exploding them). Seemed like it depends on composite event routing and 
ondemand routing. So its like a mix of subsets of 1 and 2 with the encoding 
used to reduce the routing cpu overhead?
Yes, 3 is trying to setup a more efficient storage model for events, at the 
expense of some additional CPU to lookup from this storage instead of from a 
simple list. There'll still be a fairly large memory overhead given that we're 
trying to model a very generic routing table.
For both approach 1 and 3, having the Edge try to infer information about the 
event routing - e.g. To all tasks / To all indices could be used for improving 
the efficiency.

bq. Moving routing into vertex plugins instead of edge plugins is shifting 
responsibilities architecturally. Not sure how it solves the event overhead or 
reference overhead. Making a hack in the shuffle plugin is probably not a good 
idea. Users can be extending it or using their own for shuffles when they want 
their own version of auto-reduce.
If this is referencing option 4, the VertexPlugin isn't really involved. Some 
more details.
The EdgePlugin becomes responsible for storing the routing table, as well as 
serving out requests of the kind getNextNEvents(int targetTaskIndex, int 
numEvents). The edge plugin essentially gets the option of storing the routing 
table - which it can do in a very efficient manner, especially for 
'predictable' routing tables.
In case of ScatterGather for instance - the M CompositeDataMovementEvents need 
to be stored, N integers for event indices which have been consumed by the 
specific N consumer tasks. A lookup by task J involves taking the Jth index 
(K)- looking up 'numEvents' elements from the Kth index of the 
CompositeDataMovementEvent list.
The base EdgeManager can implement generic storage and routing for all 
non-standard EdgeManagerPlugins - which users can choose to override for 
efficiency. It does make EdgeManagerPlugins more complicated to write if users 
go down this route - but will be the most memory / CPU efficient since 
EMPlugins know exactly what they're dealing with.

TOL, a variant of this approach, this could be achieved by changing the 
construct returned by the routeEvent method call - and make that an interface, 
which provides lookup capabilities based on taskIndex. (Related to TEZ-1927)
EventRoutingInfo routeCompositeEvent(CompositeEvent compositeEvent);
M instances of EventRoutingInfo stored.
Runtime lookup of EventRoutingInfo.getEventsForTask(int taskIndex) - which 
generates the actual events.
Shuffle can implement it's own variant of EventRoutingInfo which is memory 
efficient. Has a CPU overhead though - possibly the same as dynamic routing?

bq. TezEvents are wrappers around data movement events. They are replicated 
only for composite dm events since they are exploded. tezevent store identical 
src/dest metainfo for the exploded events. Not sure about the math of reducing 
the ref count from 16 to 2-3. Not replicating them and sharing them across dm 
events may cause race conditions during rpc serialization.
This is more like 8 down to 2 - since a entities like srcMeta, destMeta and the 
ByteBuffer are already shared. That should still give a good gain - however, 
the MXN reference matrix remains - which in itself can be huge.

> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>         Attachments: events-problem-solutions.txt
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to