[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384866#comment-15384866
 ] 

Sangjin Lee commented on MAPREDUCE-6735:
----------------------------------------

It is certainly surprising that HADOOP-12107 is making a difference on 
terasort. What version of hadoop are you using for your test? Java version? Is 
it repeatable (i.e. the gap shows up consistently)?

FYI, the nature of HADOOP-12107 has to do with *when* to clean up a certain 
data ( {{allData}} ) inside the {{FileSystem.Statistics}} objects. Before this 
change, it would get cleaned up when the owner thread gets garbage collected 
*and* a read operation is done on the {{Statistics}} object. By read operations 
I mean methods such as {{getBytesRead()}} and so on.

After this change, the timing of this clean-up no longer depends on the read 
operations, and it will be done promptly when the thread is garbage collected. 
So in a sense, the change first ensures there is clean-up no matter what, and 
also moves up the timing of the clean-up.

The worst-case scenario in which this can have a negative impact on performance 
is if the use case *never* reads the statistics. Prior to the change, as long 
as the heap can contain these objects, no clean-up will be done. With the 
change now we do perform additional clean-up on threads garbage collection.

A subsequent observation is that the impact of the clean-up is greater if there 
is a *high degree of thread churn* within the JVM. If we're talking about only 
a handful of threads or long-lived threads, there should really be no 
difference.

I would greatly appreciate it if you could dig a little deeper via logging or 
low overhead profiling to pinpoint the correlation. Thanks.

> Performance degradation caused by MAPREDUCE-5465 and HADOOP-12107
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-6735
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6735
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Alexandr Balitsky
>
> Two commits, MAPREDUCE-5465 and HADOOP-12107 are making Terasort on YARN 10% 
> slower.
> Reduce phase with those commits ~5 mins
> Reduce phase without ~3.5 mins
> Average Reduce is taking 4mins, 16sec with those commits compared to 3mins, 
> 48sec without.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to