[
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522686#comment-14522686
]
Bikas Saha commented on TEZ-776:
--------------------------------
Thanks for the numbers and trying it out on a cluster. However the comparison
is not apples to apples for the following reasons
1) The patch in TEZ-2255 is doing e2e Composite event routing (design 1 in the
original design document). So its not creating new DataMovement event objects
in the AM. My profiling shows that new object creation is the biggest CPU
culprit in this code path.
2) The patch in TEZ-2255 is a POC patch while the patch here is taking care of
all cases. A quick look shows at TEZ-2255 shows potential short cuts. Even
though the design in TEZ-2255 envisages the creation of a RoutedEvent the patch
is currently just modifying the CompositeEvent in place with the target index
(which may not be theoretically correct). New object creation eats CPU.
Similarly, the target index is being set in the task by using the tasks id
which is not a real solution (apart from other things it breaks auto-reduce).
It is likely that a full implementation will use more CPU than the currently
attached patch on TEZ-2255.
However, the numbers are useful because they show how much gain can be expected
to be made after doing e2e composite event routing. I have not done that in
this patch since it increases the scope of work but I will do that as a follow
up since the API allows for it.
Pragmatically, for the 1-1 case, it cannot be denied that the ODR is doing
unnecessary iterations. And clearly, the difference will increase with job size
but so will the real work done by a real job of that size instead of an empty
job running 100K tasks.
> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
> Key: TEZ-776
> URL: https://issues.apache.org/jira/browse/TEZ-776
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch,
> TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch,
> TEZ-776.7.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch,
> TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch,
> TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch,
> With_Patch_AM_hotspots.png, With_Patch_AM_profile.png,
> Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt,
> with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks
> that can be processed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)