[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360078#comment-14360078
 ] 

Siddharth Seth commented on TEZ-776:
------------------------------------

Please see my first comment on the document posted - questioning the CPU 
efficiency of the ODR approach. *This is converting, what is primarily a MXN 
memory problem, into a MXN CPU problem.* That’s an approach, which I wouldn’t 
even consider, except for the fact - that we already have an (unnecessary) MXN 
CPU issue for ScatterGather edges - which I didn’t realize earlier - and that 
single case becomes better in terms of memory. For other edge types - they in 
fact move from a < MXN memory/CPU issue to a guaranteed MXN CPU issue. This 
forces CPU inefficiency on ALL edge types.
Introducing a N^2 algorithm (where N is non-trivial), when a more optimal 
approach exists, is not the right way to go. The fact that routing is a 
fraction of AM CPU, to me, says that we have other avenues to improve CPU 
utilization along with memory, rather than using this as justification to put 
in an inefficient algorithm. There's numbers posted previously which show CPU 
efficiency improving marginally or remaining roughly the same for 
ScatterGather, but degrading quite a bit for OneToOne.
If there were no API changes involved - this can be iterated upon more easily, 
since it does improve things for the most commonly used case and users wouldn't 
know the difference. However, API changes are involved here - which are 
avoidable, and are also required in the approach of moving events into the 
edge. Hence my previous comment and suggestion.

bq. some yet to be built concept. Other approaches could be implemented in 
full, tested, profiled and verified
I’m at a loss here. Are you suggesting that we discuss options based off of 
patches ? Surely we can reason about and discuss alternate approaches without 
code changes being in place ? I'm sure it makes sense for you to go ahead and 
iterate on the approach, test it etc. However, if there's alternates being 
discussed from day1, which haven't been fully discussed - there is a chance 
that the final approach and patch will need to change.

> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>         Attachments: TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, 
> TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, 
> TEZ-776.ondemand.6.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
> With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
> events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
> without_patch_jmc_output_of_AM.png
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to