[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520833#comment-14520833
 ] 

Siddharth Seth commented on TEZ-776:
------------------------------------

Some numbers form various runs. All of these are on a 19 node cluster. Logging 
for the AM and tasks is at WARN level to remove unnecessary noise from there 
(we need to reduce log verbosity). Each AM ran on a dedicated node (no tasks), 
with a 4GB heap.
Run1, Run2, Run3 were all using the same AppMaster.
This was an example which does no processing, but calls waitForInputReady to 
ensure events are consumed.

h5. OneToOne routing. 50K X 50K tasks.
||Type|Run1 time| Run1 CPU time|Run2 time|Run2 CPU time|Run3time|Run3 CPU time||
|ODR A|90000|415590|68316|383230|64607|378620|
|ODR B|87898|411000|63020|372700|63339|353190|
|CurrentRouting|84971|285050|59421|231810|57730|227650|
The CPU utilization is quite a bit lower in current trunk - almost 50% less. 
The runtime is also consistently lower with current routing. I'd expect the 
difference in the runtimes to go up with the AM were running on a loaded box.

h5. ScatterGather routing. 20K X 10K tasks.
||Type|Run1 time| Run1 CPU time|Run2 time|Run2 CPU time|Run3time|Run3 CPU time||
|ODR A|202017|543170|185898|501720|186318|499630|
|ODR B|201838|554100|185261|508980|185466|501720|
|TEZ-2255|197751|394190|182082|349920|180081|349350|

h5. ScatterGather routing. 50K X 20K tasks.
||Type|Run1 time| Run1 CPU time|Run2 time|Run2 CPU time|Run3time|Run3 CPU time||
|ODR A||497783|1391380|490622|1369060|500854|1374190|
|ODR B|498495|1428270|490864|1399850|497942|1400840|
|TEZ-2255|476431|992110|460101|963830|456279|960580|
Similar observations for both ScatterGather runs. With 20K consumers, runtime 
is ~9-10% better with 2255. CPU is close to a 40% improvement. My guess is the 
CPU gap widens as the #of reducers increases. The ScatterGather cases cannot 
run against current master.

> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>         Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, 
> TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, 
> TEZ-776.7.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, 
> TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, 
> TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, 
> With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, 
> Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, 
> with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to