[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370582#comment-14370582
 ] 

Siddharth Seth commented on TEZ-776:
------------------------------------

Apologies for the late reply. Options have been discussed - I won't extensively 
on the jira though. The offline discussion that happened was way after the 
patches had been posted.
There's a handful of EdgeManagers that exist, and it isn't likely that a whole 
lot of them will be written. There'll definitely be some - and I think it's 
acceptable to push a little additional logic into them for better efficiency. 
It's not clear that the framework can actual handle obsoletion very well - 
especially when indices on InputFailed events are considered - specific inputs 
failing. Today we assume that a task generates all data to the node on which it 
was running. In terms of expense and having to walk through all events - that 
will not be the case. Changes may be required to facilitate that, but it'll 
definitely be more efficient.
That said, having EMPlugins being easy to write and test is important - in case 
someone does want to write one. Default implementations for more complicated 
tracking functions can easily be provided - similar to the current storage 
model. If an EMPlugin writer wants to deploy for larger jobs - they'd have to 
deal with storage.

I like the suggestion of limiting on demand routing to where it is really 
needed. That seems like a good middle ground to make progress. One suggestion 
there would be to introduce an Interface private to Tez which can be 
implemented by ScatterGather and ShuffleEdgeManager (for ARP). The Edge can 
easily choose which methods to use for routing based on this interface. There's 
no need for a flag to enable / disable this - and a lower memory overhead comes 
out of the box, without affecting the OneToOne and broadcast case. Also, the 
API doesn't change at all.

> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>         Attachments: TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, 
> TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, 
> TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, 
> With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, 
> Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, 
> with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to