Sangjin Lee created HADOOP-12107:
------------------------------------
Summary: long running apps may have a huge number of
StatisticsData instances under FileSystem
Key: HADOOP-12107
URL: https://issues.apache.org/jira/browse/HADOOP-12107
Project: Hadoop Common
Issue Type: Bug
Components: fs
Affects Versions: 2.7.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Minor
We observed with some of our apps (non-mapreduce apps that use filesystems)
that they end up accumulating a huge memory footprint coming from
{{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of
{{Statistics}}).
Although the thread reference from {{StatisticsData}} is a weak reference, and
thus can get cleared once a thread goes away, the actual {{StatisticsData}}
instances in the list won't get cleared until any of these following methods is
called on {{Statistics}}:
- {{getBytesRead()}}
- {{getBytesWritten()}}
- {{getReadOps()}}
- {{getLargeReadOps()}}
- {{getWriteOps()}}
- {{toString()}}
It is quite possible to have an application that interacts with a filesystem
but does not call any of these methods on the {{Statistics}}. If such an
application runs for a long time and has a large amount of thread churn, the
memory footprint will grow significantly.
The current workaround is either to limit the thread churn or to invoke these
operations occasionally to pare down the memory. However, this is still a
deficiency with {{FileSystem$Statistics}} itself in that the memory is
controlled only as a side effect of those operations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)