[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

Siddharth Seth (JIRA) Fri, 01 May 2015 12:15:36 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523740#comment-14523740
 ]


Siddharth Seth commented on TEZ-776:
------------------------------------

bq. The code is being optimistic here and its explained in the comments. 
Nothing is deleted from this list and only one place for additions. So there is 
no concurrent modification issues. Its ok to read the size outside the lock 
since we will get a snapshot number to events to look at. Taking a lock is not 
going to give any better guarantees about seeing the latest event being added.
I'm not very sure about this. Adding an element elsewhere changes the 
underlying structure of the ArrayList. Accessing the size() and elements can be 
problematic.
bq. This would need to change the index based lookup from a task to change to 
multiple indices instead of single index. That is a change I plan to make in a 
follow up jira. This would enable separating edge routing from non-edge routing 
or separate routing for different edges. Though I don't want to support the 
existing legacy routing in tandem with ODR. Unnecessary complexity and ODR 
offloads event processing from the central dispatcher which is a good benefit 
by itself.
bq. That is not optimal. In the average small job case, the CPU is not very 
relevant. However, for a large vertex with mixed edges we should not fall back 
to legacy routing because the broadcast is legacy. In addition it take routing 
pressure of the central dispatcher which helps get more useful work done faster 
on the central dispatcher. In various scenarios the central dispatcher has 
often shown up as a bottleneck. A lot of the CPU overhead will go away once we 
stop creating new event objects in the AM. Thats another follow up jira to this 
one.
We shouldn't fall back to regular routing just because a single plugin does not 
implement ODR. That's the intent of allowing a vertex to work with both. I 
maintain that it's a simple change to allow both to run at the same time. Just 
add events to the taskEvents list depending on whether the edge uses ODR. The 
index lookups should remain the same.
One of the big reasons to do this is to avoid the unnecessary regression in 
performance for OneToOne edges that this causes. Broadcast will regress as well 
- but to a smaller extent. There's no reason to introduce a regression when it 
can be avoided easily.

bq. Partially implementing the API will not work since this enables on demand 
routing for all types of events. Like I said, the point of this check is not 
identify ODR plugins but to identify legacy plugins. This is only there to 
allow older versions of Hive to run with newer versions of Tez transparently. 
This check is practically sufficient.
Sufficient only if everyone implements all methods. A separate interface makes 
this a lot cleaner, and allows for better evolution in the future. I don't 
think ODR is set in stone - and will likely evolve. It helps the ScatterGather 
case - so should go in now, but will evolve in future releases. Keeping it as a 
separate interface helps with this.

On the B patch, will take a look to see what's changed - but it seems like a 
last minute change after the offline discussion on TEZ-2255. Definitely needs 
additional review. One quick note on this is to expose the actual event - in 
case the user does want to do something with it (not advised).



> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>         Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, 
> TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, 
> TEZ-776.7.patch, TEZ-776.8.patch, TEZ-776.9.patch, TEZ-776.ondemand.1.patch, 
> TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, 
> TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, 
> TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
> With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
> events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
> without_patch_jmc_output_of_AM.png
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

Reply via email to