[ 
https://issues.apache.org/jira/browse/HADOOP-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064274#comment-14064274
 ] 

Chris Nauroth commented on HADOOP-10840:
----------------------------------------

[~shanyu], nice find.  I think your theory was correct.  The last patch mostly 
fixed things, but I still see a few test failures.  With an Azure storage key 
configured for testing against the live service, I get a failure in 
{{TestAzureConcurrentOutOfBandIo}}:

{code}
Running org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.81 sec <<< 
FAILURE! - in org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo
testReadOOBWrites(org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo)  
Time elapsed: 0.765 sec  <<< ERROR!
org.apache.hadoop.metrics2.MetricsException: Metrics source 
AzureFileSystemMetrics already exists!
        at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:143)
        at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:120)
        at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218)
        at 
org.apache.hadoop.fs.azure.metrics.AzureFileSystemMetricsSystem.registerSource(AzureFileSystemMetricsSystem.java:58)
        at 
org.apache.hadoop.fs.azure.AzureBlobStorageTestAccount.createOutOfBandStore(AzureBlobStorageTestAccount.java:331)
        at 
org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo.setUp(TestAzureConcurrentOutOfBandIo.java:51)
{code}

I tested on both Mac and Windows.  On the Windows VM only, I also get failures 
in {{TestRollingWindowAverage}} and {{TestNativeAzureFileSystemMocked}}:

{code}
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.149 sec <<< 
FAILURE! - in org.apache.hadoop.fs.azure.metrics.TestRollingWindowAverage
testBasicFunctionality(org.apache.hadoop.fs.azure.metrics.TestRollingWindowAverage)
  Time elapsed: 0.112 sec  <<< FAILURE!
java.lang.AssertionError: expected:<15> but was:<10>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:555)
        at org.junit.Assert.assertEquals(Assert.java:542)
        at 
org.apache.hadoop.fs.azure.metrics.TestRollingWindowAverage.testBasicFunctionality(TestRollingWindowAverage.java:38)
{code}

{code}
Tests run: 27, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.431 sec <<< 
FAILURE! - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked
testFolderLastModifiedTime(org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked)
  Time elapsed: 3.24 sec  <<< FAILURE!
java.lang.AssertionError: null
        at org.junit.Assert.fail(Assert.java:86)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.junit.Assert.assertFalse(Assert.java:64)
        at org.junit.Assert.assertFalse(Assert.java:74)
        at 
org.apache.hadoop.fs.azure.NativeAzureFileSystemBaseTest.testFolderLastModifiedTime(NativeAzureFileSystemBaseTest.java:479)
{code}

Can you explain why the following code was removed from 
{{AzureFileSystemMetricsSystem#fileSystemClosed}}?  My understanding is that 
this code is important to guarantee timely publishing of metrics for an 
instance when it gets closed.  I expect your new checks against double close 
are also sufficient to protect against extraneous publishing of metrics.

{code}
-    if (instance != null) {
-      instance.publishMetricsNow();
-    }
{code}


> Fix OutOfMemoryError caused by metrics system in Azure File System
> ------------------------------------------------------------------
>
>                 Key: HADOOP-10840
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10840
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 2.4.1
>            Reporter: shanyu zhao
>            Assignee: shanyu zhao
>         Attachments: HADOOP-10840.1.patch, HADOOP-10840.patch
>
>
> In Hadoop 2.x the Hadoop File System framework changed and no cache is 
> implemented (refer to HADOOP-6356). This means for every WASB access, a new 
> NativeAzureFileSystem is created, along which a Metrics source created and 
> added to MetricsSystemImpl. Over time the sources accumulated, eating memory 
> and causing Java OutOfMemoryError.
> The fix is to utilize the unregisterSource() method added to MetricsSystem in 
> HADOOP-10839.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to