[
https://issues.apache.org/jira/browse/HADOOP-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064274#comment-14064274
]
Chris Nauroth commented on HADOOP-10840:
----------------------------------------
[~shanyu], nice find. I think your theory was correct. The last patch mostly
fixed things, but I still see a few test failures. With an Azure storage key
configured for testing against the live service, I get a failure in
{{TestAzureConcurrentOutOfBandIo}}:
{code}
Running org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.81 sec <<<
FAILURE! - in org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo
testReadOOBWrites(org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo)
Time elapsed: 0.765 sec <<< ERROR!
org.apache.hadoop.metrics2.MetricsException: Metrics source
AzureFileSystemMetrics already exists!
at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:143)
at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:120)
at
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218)
at
org.apache.hadoop.fs.azure.metrics.AzureFileSystemMetricsSystem.registerSource(AzureFileSystemMetricsSystem.java:58)
at
org.apache.hadoop.fs.azure.AzureBlobStorageTestAccount.createOutOfBandStore(AzureBlobStorageTestAccount.java:331)
at
org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo.setUp(TestAzureConcurrentOutOfBandIo.java:51)
{code}
I tested on both Mac and Windows. On the Windows VM only, I also get failures
in {{TestRollingWindowAverage}} and {{TestNativeAzureFileSystemMocked}}:
{code}
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.149 sec <<<
FAILURE! - in org.apache.hadoop.fs.azure.metrics.TestRollingWindowAverage
testBasicFunctionality(org.apache.hadoop.fs.azure.metrics.TestRollingWindowAverage)
Time elapsed: 0.112 sec <<< FAILURE!
java.lang.AssertionError: expected:<15> but was:<10>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at
org.apache.hadoop.fs.azure.metrics.TestRollingWindowAverage.testBasicFunctionality(TestRollingWindowAverage.java:38)
{code}
{code}
Tests run: 27, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.431 sec <<<
FAILURE! - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked
testFolderLastModifiedTime(org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked)
Time elapsed: 3.24 sec <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at org.junit.Assert.assertFalse(Assert.java:74)
at
org.apache.hadoop.fs.azure.NativeAzureFileSystemBaseTest.testFolderLastModifiedTime(NativeAzureFileSystemBaseTest.java:479)
{code}
Can you explain why the following code was removed from
{{AzureFileSystemMetricsSystem#fileSystemClosed}}? My understanding is that
this code is important to guarantee timely publishing of metrics for an
instance when it gets closed. I expect your new checks against double close
are also sufficient to protect against extraneous publishing of metrics.
{code}
- if (instance != null) {
- instance.publishMetricsNow();
- }
{code}
> Fix OutOfMemoryError caused by metrics system in Azure File System
> ------------------------------------------------------------------
>
> Key: HADOOP-10840
> URL: https://issues.apache.org/jira/browse/HADOOP-10840
> Project: Hadoop Common
> Issue Type: Bug
> Components: metrics
> Affects Versions: 2.4.1
> Reporter: shanyu zhao
> Assignee: shanyu zhao
> Attachments: HADOOP-10840.1.patch, HADOOP-10840.patch
>
>
> In Hadoop 2.x the Hadoop File System framework changed and no cache is
> implemented (refer to HADOOP-6356). This means for every WASB access, a new
> NativeAzureFileSystem is created, along which a Metrics source created and
> added to MetricsSystemImpl. Over time the sources accumulated, eating memory
> and causing Java OutOfMemoryError.
> The fix is to utilize the unregisterSource() method added to MetricsSystem in
> HADOOP-10839.
--
This message was sent by Atlassian JIRA
(v6.2#6252)