[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523656#comment-14523656
 ] 

Siddharth Seth commented on TEZ-776:
------------------------------------

bq. 1) The patch in TEZ-2255 is doing e2e Composite event routing (design 1 in 
the original design document). So its not creating new DataMovement event 
objects in the AM. My profiling shows that new object creation is the biggest 
CPU culprit in this code path.
2) The patch in TEZ-2255 is a POC patch while the patch here is taking care of 
all cases. A quick look shows at TEZ-2255 shows potential short cuts. Even 
though the design in TEZ-2255 envisages the creation of a RoutedEvent the patch 
is currently just modifying the CompositeEvent in place with the target index 
(which may not be theoretically correct). New object creation eats CPU. 
Similarly, the target index is being set in the task by using the tasks id 
which is not a real solution (apart from other things it breaks auto-reduce). 
It is likely that a full implementation will use more CPU than the currently 
attached patch on TEZ-2255.

The entire intent of Option2/3 in the first few comments on this jira is to 
allow for Edges to handle events more efficiently. Part of that is storage, the 
other aspect is avoiding unnecessary explosions of events.
Setting indices on CompositeEvents in TEZ-2255 makes no difference - the 
alternate is to set them in RoutedEvent which is just as efficient. In fact, 
the original patch there did exactly this, and avoided duplicate serialization 
/ deserialization of events. That gives an even bigger gain - but would not be 
an equivalent comparison since ODR does not do that - which is why I removed it.

OneToOne clearly shows that event creation is not the only factor which 
contributes to performance. There's just 50K DMEs being created there - the 
cost comes from the additional invocations. The same applies to ScatterGather 
as well - routing the same event multiple times over contributes to the high 
utilization. Avoiding DME creation will help - not to the extent of 40% though.
Running the tasks through in the OneToOne case will make it worse. This runs 
with a UnorderedInput - which will complete the moment an event is available. 
For a large number of tasks - 50K events aren't scanned in the AM for the 
specific benchmark. For a longer running task - all events will be scanned and 
will contributed more to the CPU numbers.

bq. However, the numbers are useful because they show how much gain can be 
expected to be made after doing e2e composite event routing. I have not done 
that in this patch since it increases the scope of work but I will do that as a 
follow up since the API allows for it.
Explained above why EtoE composite routing is not what is affecting CPU. Please 
hold off on writing this patch - it'll likely be different based on TEZ-2255.



> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>         Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, 
> TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, 
> TEZ-776.7.patch, TEZ-776.8.patch, TEZ-776.9.patch, TEZ-776.ondemand.1.patch, 
> TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, 
> TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, 
> TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
> With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
> events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
> without_patch_jmc_output_of_AM.png
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to