[ 
https://issues.apache.org/jira/browse/HADOOP-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452326#comment-16452326
 ] 

Sean Mackrory edited comment on HADOOP-15392 at 4/25/18 2:20 PM:
-----------------------------------------------------------------

{quote}MapReduce job, but by hbase ExportSnapshot utility{quote}
{quote}It might, however, be related to 
https://issues.apache.org/jira/browse/HBASE-20433 {quote}

Yeah that's what I meant - ExportSnapshot is essentially a MapReduce job. I do 
see it closing the filesystem instances towards the end of doWork(), fwiw. It's 
reasonable to assume those FS instances should be open for the whole duration 
of the job - so the fix most likely lives at the FS level here, even if it's 
not just disabling metrics by default.

{quote}Yes, we need to fix this{quote}

Well let's make sure we're fixing the right problem first. 53,000 
S3Ainstrumentation instances means S3AFileSystem.initialize is getting called 
once for every single file - that's also a lot of overhead that doesn't seem 
right to me. Has filesystem caching been disabled for some reason? And can you 
clarify what's configured in hadoop-metrics2.properties? I was testing with a 
much lower number of large files - but the threads I saw growing unbounded 
already only show up if you explicitly configure sinks for the s3a-file-system 
metrics. I'll try with a large number of files and verify that this 
accumulation is happening in threads that do exist without explicitly enabling 
them.


was (Author: mackrorysd):
{quote}MapReduce job, but by hbase ExportSnapshot utility{quote}
{quote}It might, however, be related to 
https://issues.apache.org/jira/browse/HBASE-20433 {quote}

Yeah that's what I meant - ExportSnapshot is essentially a MapReduce job. I do 
see it closing the filesystem instances towards the end of doWork()

{quote}Yes, we need to fix this{quote}

Well let's make sure we're fixing the right problem first. 53,000 
S3Ainstrumentation instances means S3AFileSystem.initialize is getting called 
once for every single file - that's also a lot of overhead that doesn't seem 
right to me. Has filesystem caching been disabled for some reason? And can you 
clarify what's configured in hadoop-metrics2.properties? I was testing with a 
much lower number of large files - but the threads I saw growing unbounded 
already only show up if you explicitly configure sinks for the s3a-file-system 
metrics. I'll try with a large number of files and verify that this 
accumulation is happening in threads that do exist without explicitly enabling 
them.

> S3A Metrics in S3AInstrumentation Cause Memory Leaks in HBase Export
> --------------------------------------------------------------------
>
>                 Key: HADOOP-15392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15392
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Voyta
>            Priority: Blocker
>
> While using HBase S3A Export Snapshot utility we started to experience memory 
> leaks of the process after version upgrade.
> By running code analysis we traced the cause to revision 
> 6555af81a26b0b72ec3bee7034e01f5bd84b1564 that added the following static 
> reference (singleton):
> private static MetricsSystem metricsSystem = null;
> When application uses S3AFileSystem instance that is not closed immediately 
> metrics are accumulated in this instance and memory grows without any limit.
>  
> Expectation:
>  * It would be nice to have an option to disable metrics completely as this 
> is not needed for Export Snapshot utility.
>  * Usage of S3AFileSystem should not contain any static object that can grow 
> indefinitely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to