[ 
https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596806#comment-14596806
 ] 

Sangjin Lee commented on HADOOP-12107:
--------------------------------------

Thanks [~mingma] and [~cmccabe] for your comments!

bq. Is there a good way to write any unit test for this?
I was relying on existing unit tests on {{FileSystem.Statistics}} (there are 
several). I'm still hoping that is good enough for this, but let me know if you 
want me to look into creating more unit tests around this.

Regarding the number of threads for the cleaner, yes, it definitely occurred to 
me that this might create an issue in terms of potentially a large number of 
additional threads. The only reason I shied away from a global cleaner thread 
initially is I wasn't entirely sure whether it would be completely safe and 
perform well in terms of locking (note that the cleaner needs to acquire a lock 
for each {{Statistics}} instance). Let me explore that still.

bq. The null check isn't needed any more since you changed allData to be 
initialized in the constructor.
Thanks for pointing that out. I remembered it at one point, but forgot to 
include that change in the patch.

One other point of discussion: I'm removing the {{StatisticsData}} constructor 
that takes a Thread as the argument (along with the {{owner}} member variable) 
as it no longer needs the thread inside {{StatisticsData}}. But since 
{{StatisticsData}} is technically a public class, it could be considered an API 
incompatible change. Thoughts on this? No one outside {{FileSystem}} is using 
the constructor or the member variable, and I cannot think of why anyone would. 
But if we need to absolutely maintain the API compatibility, I cannot remove 
them. Let me know what you think.

> long running apps may have a huge number of StatisticsData instances under 
> FileSystem
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-12107
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12107
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.7.0
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>            Priority: Minor
>         Attachments: HADOOP-12107.001.patch
>
>
> We observed with some of our apps (non-mapreduce apps that use filesystems) 
> that they end up accumulating a huge memory footprint coming from 
> {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of 
> {{Statistics}}).
> Although the thread reference from {{StatisticsData}} is a weak reference, 
> and thus can get cleared once a thread goes away, the actual 
> {{StatisticsData}} instances in the list won't get cleared until any of these 
> following methods is called on {{Statistics}}:
> - {{getBytesRead()}}
> - {{getBytesWritten()}}
> - {{getReadOps()}}
> - {{getLargeReadOps()}}
> - {{getWriteOps()}}
> - {{toString()}}
> It is quite possible to have an application that interacts with a filesystem 
> but does not call any of these methods on the {{Statistics}}. If such an 
> application runs for a long time and has a large amount of thread churn, the 
> memory footprint will grow significantly.
> The current workaround is either to limit the thread churn or to invoke these 
> operations occasionally to pare down the memory. However, this is still a 
> deficiency with {{FileSystem$Statistics}} itself in that the memory is 
> controlled only as a side effect of those operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to