[
https://issues.apache.org/jira/browse/MAPREDUCE-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384866#comment-15384866
]
Sangjin Lee commented on MAPREDUCE-6735:
----------------------------------------
It is certainly surprising that HADOOP-12107 is making a difference on
terasort. What version of hadoop are you using for your test? Java version? Is
it repeatable (i.e. the gap shows up consistently)?
FYI, the nature of HADOOP-12107 has to do with *when* to clean up a certain
data ( {{allData}} ) inside the {{FileSystem.Statistics}} objects. Before this
change, it would get cleaned up when the owner thread gets garbage collected
*and* a read operation is done on the {{Statistics}} object. By read operations
I mean methods such as {{getBytesRead()}} and so on.
After this change, the timing of this clean-up no longer depends on the read
operations, and it will be done promptly when the thread is garbage collected.
So in a sense, the change first ensures there is clean-up no matter what, and
also moves up the timing of the clean-up.
The worst-case scenario in which this can have a negative impact on performance
is if the use case *never* reads the statistics. Prior to the change, as long
as the heap can contain these objects, no clean-up will be done. With the
change now we do perform additional clean-up on threads garbage collection.
A subsequent observation is that the impact of the clean-up is greater if there
is a *high degree of thread churn* within the JVM. If we're talking about only
a handful of threads or long-lived threads, there should really be no
difference.
I would greatly appreciate it if you could dig a little deeper via logging or
low overhead profiling to pinpoint the correlation. Thanks.
> Performance degradation caused by MAPREDUCE-5465 and HADOOP-12107
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-6735
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6735
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Alexandr Balitsky
>
> Two commits, MAPREDUCE-5465 and HADOOP-12107 are making Terasort on YARN 10%
> slower.
> Reduce phase with those commits ~5 mins
> Reduce phase without ~3.5 mins
> Average Reduce is taking 4mins, 16sec with those commits compared to 3mins,
> 48sec without.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]