[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem

Colin Patrick McCabe (JIRA) Fri, 26 Jun 2015 12:04:52 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603433#comment-14603433
 ]


Colin Patrick McCabe commented on HADOOP-12107:
-----------------------------------------------

OK, at the risk of being pedantic, here is my rundown.  While the 
{{StatisticsData}} class itself is public, the {{StatisticsData}} constructor 
is not.  It is "package-private" (the access class which things get in Java if 
there is no public, private, or protected keyword on them.)  This means that a 
{{StatisticsData}} object can only be created by code in the 
{{org.apache.hadoop.fs}} package.  You can try this for yourself-- write a 
program external to hadoop that tries to create a {{StatisticsData}} object via 
this constructor.  It will not compile.  This constructor is safe to remove, so 
let's do that.

bq. Colin Patrick McCabe, good point on the one hand but on the other hand this 
constructor is package-scope, and technically usable if an creates a class with 
the same package name, regardless how unlikely or illegal (in terms of 
specified audience) it is. How about we defensively keep that constructor for 
branch-2 at least?

No.  Users simply can't add code to the {{org.apache.hadoop.fs}} package.  If 
they do, things are not going to work-- there are going to be naming conflicts, 
class resolution issues, etc. etc.  There is no possible way we can support 
users doing this and no reason to support it.  If we tried, we would have to 
essentially freeze the API of every single class in Hadoop-- we would have to 
re-have this discussion each time we changed some package-private variable or 
function.  Private and package-private stuff is private-- it's even enforced by 
the compiler, you can't get much more private than that.

> long running apps may have a huge number of StatisticsData instances under 
> FileSystem
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-12107
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12107
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.7.0
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>            Priority: Critical
>         Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, 
> HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch
>
>
> We observed with some of our apps (non-mapreduce apps that use filesystems) 
> that they end up accumulating a huge memory footprint coming from 
> {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of 
> {{Statistics}}).
> Although the thread reference from {{StatisticsData}} is a weak reference, 
> and thus can get cleared once a thread goes away, the actual 
> {{StatisticsData}} instances in the list won't get cleared until any of these 
> following methods is called on {{Statistics}}:
> - {{getBytesRead()}}
> - {{getBytesWritten()}}
> - {{getReadOps()}}
> - {{getLargeReadOps()}}
> - {{getWriteOps()}}
> - {{toString()}}
> It is quite possible to have an application that interacts with a filesystem 
> but does not call any of these methods on the {{Statistics}}. If such an 
> application runs for a long time and has a large amount of thread churn, the 
> memory footprint will grow significantly.
> The current workaround is either to limit the thread churn or to invoke these 
> operations occasionally to pare down the memory. However, this is still a 
> deficiency with {{FileSystem$Statistics}} itself in that the memory is 
> controlled only as a side effect of those operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem

Reply via email to