[
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527845#comment-14527845
]
Bikas Saha commented on TEZ-776:
--------------------------------
prepareForRouting is guarded by synchronized in Edge which creates a read write
barrier.
Agree about duplication, but each case has minor differences in which indices
to use or which events to create and hence hard to merge. Once we move away
from event creation in the AM, there will be more scope to reduce duplication.
Trying to keep the new abstract class for ODR complete in itself with an
eventual goal of not deriving from the legacy class.
The array list size read is thread safe. There is only 1 writer which prevents
concurrent modification. The size in an array/linked list is an int that is
atomically modified. There have been no issues in numerous stress simulations
and large jobs.
Broadcast edge manager cannot continue to use legacy routing since every
consumer task needs events from every producer task leading to memory reference
overhead proportional to MxN, which is large for large jobs.
I wish I could share your optimism on TEZ-2409 being 10 lines of code but I am
afraid I have tried to do it and found it to be a little more involved than
that. Besides 10 lines of code would need many more lines of new tests. This
does not have to be a blocker for 0.7.0 since its an internal framework change
and can be done in 0.7.1
Uploaded new patch with fixes.
[~hitesh] [~rajesh.balamohan] There have been fixes for your review comments
made in subsequent patches. Do you want to look at them?
> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
> Key: TEZ-776
> URL: https://issues.apache.org/jira/browse/TEZ-776
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch,
> TEZ-776.12.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch,
> TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch,
> TEZ-776.8.patch, TEZ-776.9.patch, TEZ-776.ondemand.1.patch,
> TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch,
> TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch,
> TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png,
> With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png,
> events-problem-solutions.txt, with_patch_jmc_output_of_AM.png,
> without_patch_jmc_output_of_AM.png
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks
> that can be processed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)