[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361273#comment-14361273
 ] 

Bikas Saha commented on TEZ-776:
--------------------------------

All options have been sufficiently discussed on this thread and offline. The 
option of moving all of event handling to edge plugins is a much larger change 
that shifts a lot of framework responsibility to the user. Secondly, its not 
clear how future changes/features additions around dynamic graph 
reconfigurations like changes edges and vertices at runtime may or may not be 
affected by having given control of event management to user code. Things like 
event obsoletion which can be done easily by the framework for all edges and 
IO's would need to be done by every plugin. Every plugin would need to have 
additional metadata tracking objects which are currently provided by the 
framework. Each plugin would have to handle versioning of events and 
speculation like conditions which break the time-sequential nature of version 
numbers. And probably other stuff. Firstly, that is a much larger change, that 
is related but orthogonal to the memory issue and must be discussed separately 
on its own right. Secondly, while at a high level it may seem likely that in 
some cases edge plugins might do better at CPU, I suspect that after handling 
event versioning, obsoletion, etc. the argument that plugins can avoid 
iterating over events may turn out to be specious for CPU efficiency. My 
suggestion to follow up on that approach separately is based on the above 
arguments. It's not been effectively established that moving essential 
framework responsibilities to the user is the right approach long term. Neither 
is it clear that the CPU efficiency of the final implementation that does more 
than the sunny day scenario is going to be significantly better at the cost of 
adding complexity in user code. That can only be measured. Hence, I suggested 
that it be evaluated before including that change in the project. This is the 
case with any change or feature right? My only objection was to tie the 
progress on this jira by pre-accepting the other changes without going through 
due process. Specially when this jira does not mandate any user code changes.

In the meanwhile, the current patch does not mandate any API changes for users. 
Unless users want to make the API change they can continue to use the existing 
API, even across releases. If they do want to make the change, its much simpler 
because it follows the existing pattern. But for users who are running large 
jobs and using framework built-in components, they can be unblocked on their 
scalability issues. Hence, my suggestion to complete the reviews of this patch 
and resolve it so that there is forward progress without requiring any user to 
make any code changes.

In order to make progress, what I can try to do is limit the on demand routing 
to only composite event expansion and not change the flow for any other event. 
Add a new optional API for composite event expansion that will be implement by 
internal scatter-gather edge and optional so that users dont need to change 
their code. This will solve the memory scalability issue without increasing any 
CPU cost compared to any scenario as it exists today.

I hope that clarifies and we can make progress on this jira.

> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>         Attachments: TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, 
> TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, 
> TEZ-776.ondemand.6.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
> With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
> events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
> without_patch_jmc_output_of_AM.png
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to