[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522686#comment-14522686
 ] 

Bikas Saha commented on TEZ-776:
--------------------------------

Thanks for the numbers and trying it out on a cluster. However the comparison 
is not apples to apples for the following reasons
1) The patch in TEZ-2255 is doing e2e Composite event routing (design 1 in the 
original design document). So its not creating new DataMovement event objects 
in the AM. My profiling shows that new object creation is the biggest CPU 
culprit in this code path.
2) The patch in TEZ-2255 is a POC patch while the patch here is taking care of 
all cases. A quick look shows at TEZ-2255 shows potential short cuts. Even 
though the design in TEZ-2255 envisages the creation of a RoutedEvent the patch 
is currently just modifying the CompositeEvent in place with the target index 
(which may not be theoretically correct). New object creation eats CPU. 
Similarly, the target index is being set in the task by using the tasks id 
which is not a real solution (apart from other things it breaks auto-reduce). 
It is likely that a full implementation will use more CPU than the currently 
attached patch on TEZ-2255.

However, the numbers are useful because they show how much gain can be expected 
to be made after doing e2e composite event routing. I have not done that in 
this patch since it increases the scope of work but I will do that as a follow 
up since the API allows for it.

Pragmatically, for the 1-1 case, it cannot be denied that the ODR is doing 
unnecessary iterations. And clearly, the difference will increase with job size 
but so will the real work done by a real job of that size instead of an empty 
job running 100K tasks.

> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>         Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, 
> TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, 
> TEZ-776.7.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, 
> TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, 
> TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, 
> With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, 
> Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, 
> with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to