[
https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603433#comment-14603433
]
Colin Patrick McCabe commented on HADOOP-12107:
-----------------------------------------------
OK, at the risk of being pedantic, here is my rundown. While the
{{StatisticsData}} class itself is public, the {{StatisticsData}} constructor
is not. It is "package-private" (the access class which things get in Java if
there is no public, private, or protected keyword on them.) This means that a
{{StatisticsData}} object can only be created by code in the
{{org.apache.hadoop.fs}} package. You can try this for yourself-- write a
program external to hadoop that tries to create a {{StatisticsData}} object via
this constructor. It will not compile. This constructor is safe to remove, so
let's do that.
bq. Colin Patrick McCabe, good point on the one hand but on the other hand this
constructor is package-scope, and technically usable if an creates a class with
the same package name, regardless how unlikely or illegal (in terms of
specified audience) it is. How about we defensively keep that constructor for
branch-2 at least?
No. Users simply can't add code to the {{org.apache.hadoop.fs}} package. If
they do, things are not going to work-- there are going to be naming conflicts,
class resolution issues, etc. etc. There is no possible way we can support
users doing this and no reason to support it. If we tried, we would have to
essentially freeze the API of every single class in Hadoop-- we would have to
re-have this discussion each time we changed some package-private variable or
function. Private and package-private stuff is private-- it's even enforced by
the compiler, you can't get much more private than that.
> long running apps may have a huge number of StatisticsData instances under
> FileSystem
> -------------------------------------------------------------------------------------
>
> Key: HADOOP-12107
> URL: https://issues.apache.org/jira/browse/HADOOP-12107
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 2.7.0
> Reporter: Sangjin Lee
> Assignee: Sangjin Lee
> Priority: Critical
> Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch,
> HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch
>
>
> We observed with some of our apps (non-mapreduce apps that use filesystems)
> that they end up accumulating a huge memory footprint coming from
> {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of
> {{Statistics}}).
> Although the thread reference from {{StatisticsData}} is a weak reference,
> and thus can get cleared once a thread goes away, the actual
> {{StatisticsData}} instances in the list won't get cleared until any of these
> following methods is called on {{Statistics}}:
> - {{getBytesRead()}}
> - {{getBytesWritten()}}
> - {{getReadOps()}}
> - {{getLargeReadOps()}}
> - {{getWriteOps()}}
> - {{toString()}}
> It is quite possible to have an application that interacts with a filesystem
> but does not call any of these methods on the {{Statistics}}. If such an
> application runs for a long time and has a large amount of thread churn, the
> memory footprint will grow significantly.
> The current workaround is either to limit the thread churn or to invoke these
> operations occasionally to pare down the memory. However, this is still a
> deficiency with {{FileSystem$Statistics}} itself in that the memory is
> controlled only as a side effect of those operations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)