[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328668#comment-14328668
 ] 

Bikas Saha commented on TEZ-776:
--------------------------------

Attaching a POC patch that shows that event routing can be done on-demand per 
task heartbeat request instead of doing it ahead of time in bulk. There are 
some changes to the Vertex/Edge code for efficiency because this code path will 
be hot. In addition, a departure from bulk routing to per task routing 
necessitates a change in the EdgeManager routing API for efficiency. Similar 
change is needed for on-demand Composite event expansion instead of bulk. These 
API changes in the patch are exemplary in nature and need to be discussed, 
though I would expect the final form to be similar in spirit.
The patch adds a mock app master simulation that exercises all code patch 
related to event routing and fetching except for the RPC. So it presents a 
lower bound on the amount of memory used by the app master. The results are 
very promising and suggest that on-demand routing practically solves the memory 
overhead problem in the AM as described in the problem statement document 
attached to the jira. The simulations run in almost identical time with or 
without the change, thus showing minimal performance impact.

Here are the results from the simulations
{noformat}
X = 10K Map x 1K Reduce
10X = 10K Map x 10K Reduce
100X = 100K Map x 10K Reduce
Minimal = No DM events being sent
Reference = Broadcast DM events being sent to create event reference load
ReferenceAndEvent = Composite Scatter-Gather events being sent to create 
reference and event load
Events have no payload

Original
##### Heap utilization statistics [MB] for testMinimal
##### Used Memory:45
##### Free Memory:141
##### Total Memory:187
##### Heap utilization statistics [MB] for testMinimal (10x)
##### Used Memory:76
##### Free Memory:173
##### Total Memory:250
##### Heap utilization statistics [MB] for testReference
##### Used Memory:98
##### Free Memory:673
##### Total Memory:772
##### Heap utilization statistics [MB] for testReference (10x)
##### Used Memory:613
##### Free Memory:1006
##### Total Memory:1620
##### Heap utilization statistics [MB] for testReferenceAndEvent
##### Used Memory:708
##### Free Memory:597
##### Total Memory:1305

OnDemand
##### Heap utilization statistics [MB] for testReference
##### Used Memory:44
##### Free Memory:177
##### Total Memory:222
##### Heap utilization statistics [MB] for testReference (10x)
##### Used Memory:77
##### Free Memory:211
##### Total Memory:289
##### Heap utilization statistics [MB] for testReferenceAndEvent
##### Used Memory:44
##### Free Memory:180
##### Total Memory:225
##### Heap utilization statistics [MB] for testReferenceAndEvent (10x)
##### Used Memory:77
##### Free Memory:408
##### Total Memory:486
##### Heap utilization statistics [MB] for testReferenceAndEvent (100x)
##### Used Memory:414
##### Free Memory:765
##### Total Memory:1180
{noformat}

> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>         Attachments: TEZ-776.ondemand.patch, events-problem-solutions.txt
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to