[
https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599861#comment-14599861
]
Colin Patrick McCabe commented on HADOOP-12107:
-----------------------------------------------
bq. 003 patch is good. I just don't understand why 001 patch creates one
additional thread per FileSystem object? I look at
FileSystem#getStaticstics(..). I think it creates one Staticstics object per
FileSystem class, so 001 patch creates one additional thread per FileSystem
class? I'll be grateful if somebody can guide me through this.
D'oh. You're absolutely right... there is only one thread per FileSystem
class. Forget what I said earlier.
bq. One other point of discussion: I'm removing the StatisticsData constructor
that takes a Thread as the argument (along with the owner member variable) as
it no longer needs the thread inside StatisticsData. But since StatisticsData
is technically a public class, it could be considered an API incompatible
change. Thoughts on this? No one outside FileSystem is using the constructor or
the member variable, and I cannot think of why anyone would. But if we need to
absolutely maintain the API compatibility, I cannot remove them. Let me know
what you think.
StatisticsData is a public class, but its constructor is not public. So there
is no possible API breakage here. (As a side note, even if StatisticsData were
public, it doesn't have an interface annotation specifying that it is safe to
use outside of Hadoop, so we're doubly safe here.)
{code}
2921 /**
2922 * This constructor is deprecated and is no longer used. One
should remove
2923 * any use of this constructor.
2924 */
2925 @Deprecated
2926 StatisticsData(WeakReference<Thread> owner) {
{code}
Like I explained above, we don't need this. Let's get rid of it.
StatisticsDataReference: let's put an {{\@Override}} annotation on the
functions which implement {{PhantomReference}} APIs.
{code}
142 final int maxSeconds = 10;
143 for (int i = 0; i < maxSeconds; i++) {
144 Thread.sleep(1000L);
145 allDataSize = stats.getAllThreadLocalDataSize();
146 if (allDataSize == 0) {
147 LOG.info("cleaned up after " + (i+1) + " seconds");
148 break;
149 } else {
150 LOG.info("not cleaned up after " + (i+1) + " seconds;
waiting...");
151 }
152 }
{code}
Let's use {{GenericTestUtils#waitFor}} here.
+1 once those are addressed. Thanks, [~sjlee0].
> long running apps may have a huge number of StatisticsData instances under
> FileSystem
> -------------------------------------------------------------------------------------
>
> Key: HADOOP-12107
> URL: https://issues.apache.org/jira/browse/HADOOP-12107
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 2.7.0
> Reporter: Sangjin Lee
> Assignee: Sangjin Lee
> Priority: Minor
> Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch,
> HADOOP-12107.003.patch, HADOOP-12107.004.patch
>
>
> We observed with some of our apps (non-mapreduce apps that use filesystems)
> that they end up accumulating a huge memory footprint coming from
> {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of
> {{Statistics}}).
> Although the thread reference from {{StatisticsData}} is a weak reference,
> and thus can get cleared once a thread goes away, the actual
> {{StatisticsData}} instances in the list won't get cleared until any of these
> following methods is called on {{Statistics}}:
> - {{getBytesRead()}}
> - {{getBytesWritten()}}
> - {{getReadOps()}}
> - {{getLargeReadOps()}}
> - {{getWriteOps()}}
> - {{toString()}}
> It is quite possible to have an application that interacts with a filesystem
> but does not call any of these methods on the {{Statistics}}. If such an
> application runs for a long time and has a large amount of thread churn, the
> memory footprint will grow significantly.
> The current workaround is either to limit the thread churn or to invoke these
> operations occasionally to pare down the memory. However, this is still a
> deficiency with {{FileSystem$Statistics}} itself in that the memory is
> controlled only as a side effect of those operations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)