[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem

Colin Patrick McCabe (JIRA) Mon, 29 Jun 2015 12:56:33 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606271#comment-14606271
 ]


Colin Patrick McCabe commented on HADOOP-12107:
-----------------------------------------------

Guys, we clearly define the API contract for the project.  See 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html

You have to remember that:
1. The function that you are talking about changing (the constructor) is not 
public from Java's point of view.  It is package-private.
2. The function that you are talking about changing is not public from Hadoop's 
point of view (there is no \@Public or \@LimitedPrivate annotation on it)

There is simply no reason to treat this as public.

bq. However, at the expense of being too defensive, the only test I apply here: 
is there hypothetically a scenario where an API user can be broken? My answer 
is yes if you have some org.apache.hadoop.fs.Foo calling the constructor even 
though the user absolutely should not do it. 

You could make the same argument to stop development on almost any patch.  
Almost every patch changes things which are private or package-private inside 
Hadoop.  It's simply unreasonable to try to support users who are putting their 
code inside the org.apache.hadoop.fs namespace (or any other internal project 
namespace)

> long running apps may have a huge number of StatisticsData instances under 
> FileSystem
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-12107
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12107
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.7.0
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>            Priority: Critical
>         Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, 
> HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch
>
>
> We observed with some of our apps (non-mapreduce apps that use filesystems) 
> that they end up accumulating a huge memory footprint coming from 
> {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of 
> {{Statistics}}).
> Although the thread reference from {{StatisticsData}} is a weak reference, 
> and thus can get cleared once a thread goes away, the actual 
> {{StatisticsData}} instances in the list won't get cleared until any of these 
> following methods is called on {{Statistics}}:
> - {{getBytesRead()}}
> - {{getBytesWritten()}}
> - {{getReadOps()}}
> - {{getLargeReadOps()}}
> - {{getWriteOps()}}
> - {{toString()}}
> It is quite possible to have an application that interacts with a filesystem 
> but does not call any of these methods on the {{Statistics}}. If such an 
> application runs for a long time and has a large amount of thread churn, the 
> memory footprint will grow significantly.
> The current workaround is either to limit the thread churn or to invoke these 
> operations occasionally to pare down the memory. However, this is still a 
> deficiency with {{FileSystem$Statistics}} itself in that the memory is 
> controlled only as a side effect of those operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem

Reply via email to