[
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328668#comment-14328668
]
Bikas Saha commented on TEZ-776:
--------------------------------
Attaching a POC patch that shows that event routing can be done on-demand per
task heartbeat request instead of doing it ahead of time in bulk. There are
some changes to the Vertex/Edge code for efficiency because this code path will
be hot. In addition, a departure from bulk routing to per task routing
necessitates a change in the EdgeManager routing API for efficiency. Similar
change is needed for on-demand Composite event expansion instead of bulk. These
API changes in the patch are exemplary in nature and need to be discussed,
though I would expect the final form to be similar in spirit.
The patch adds a mock app master simulation that exercises all code patch
related to event routing and fetching except for the RPC. So it presents a
lower bound on the amount of memory used by the app master. The results are
very promising and suggest that on-demand routing practically solves the memory
overhead problem in the AM as described in the problem statement document
attached to the jira. The simulations run in almost identical time with or
without the change, thus showing minimal performance impact.
Here are the results from the simulations
{noformat}
X = 10K Map x 1K Reduce
10X = 10K Map x 10K Reduce
100X = 100K Map x 10K Reduce
Minimal = No DM events being sent
Reference = Broadcast DM events being sent to create event reference load
ReferenceAndEvent = Composite Scatter-Gather events being sent to create
reference and event load
Events have no payload
Original
##### Heap utilization statistics [MB] for testMinimal
##### Used Memory:45
##### Free Memory:141
##### Total Memory:187
##### Heap utilization statistics [MB] for testMinimal (10x)
##### Used Memory:76
##### Free Memory:173
##### Total Memory:250
##### Heap utilization statistics [MB] for testReference
##### Used Memory:98
##### Free Memory:673
##### Total Memory:772
##### Heap utilization statistics [MB] for testReference (10x)
##### Used Memory:613
##### Free Memory:1006
##### Total Memory:1620
##### Heap utilization statistics [MB] for testReferenceAndEvent
##### Used Memory:708
##### Free Memory:597
##### Total Memory:1305
OnDemand
##### Heap utilization statistics [MB] for testReference
##### Used Memory:44
##### Free Memory:177
##### Total Memory:222
##### Heap utilization statistics [MB] for testReference (10x)
##### Used Memory:77
##### Free Memory:211
##### Total Memory:289
##### Heap utilization statistics [MB] for testReferenceAndEvent
##### Used Memory:44
##### Free Memory:180
##### Total Memory:225
##### Heap utilization statistics [MB] for testReferenceAndEvent (10x)
##### Used Memory:77
##### Free Memory:408
##### Total Memory:486
##### Heap utilization statistics [MB] for testReferenceAndEvent (100x)
##### Used Memory:414
##### Free Memory:765
##### Total Memory:1180
{noformat}
> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
> Key: TEZ-776
> URL: https://issues.apache.org/jira/browse/TEZ-776
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Attachments: TEZ-776.ondemand.patch, events-problem-solutions.txt
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks
> that can be processed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)