[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700823#comment-16700823
 ] 

Adam Antal commented on HADOOP-15819:
-------------------------------------

I looked through the affected tests, and maybe this comprehensive list can help 
to escalate this.
 I used [~gabor.bota]'s patch and the S3ACloseEnforcedFileSystem in it.
 Only the AbstractS3ATestBase's subtests are affected, and can be grouped by 
the following (also grouped further by common superclass):
 # Failing tests due to "Using closed FS!" with enabled cache
 ** ITestS3AInconsistency
 ** ITestS3AMiscOperations
 ** ITestS3AMiniYarnCluster
 ** ITestS3AEmptyDirectory
 ** ITestS3GuardEmptyDirs
 ** ITestS3ADelayedFNF
 ** ITestS3ACopyFromLocalFile
 ** ITestS3GuardListConsistency
 ** ITestS3AMetrics
 ** S3AScaleTestBase
 *** AbstractITestS3AMetadataStoreScale
 **** ITestLocalMetadataStoreScale
 **** ITestDynamoDBMetadataStoreScale
 *** ITestS3ADeleteManyFiles
 **** ITestS3ADeleteFilesOneByOne
 *** ITestS3AInputStreamPerformance
 *** ITestS3AConcurrentOps
 *** ITestS3ADirectoryPerformance
 *** ITestS3ACreatePerformance
 ** ITestS3GuardCreate
 ** ITestS3GuardWriteBack
 ** AbstractS3GuardToolTestBase
 *** ITestS3GuardToolDynamoDB
 *** ITestS3GuardToolLocal
 ** ITestAssumeRole
 ** ITestS3GuardConcurrentOps
 ** ITestS3AFileOperationCost
 ** ITestS3ABlocksize
 ** ITestS3AClosedFS
 ** AbstractCommitITest
 *** ITestS3ACommitterFactory
 # Tests not failing and having explicitly disabled cache
 ** ITestS3AMultipartUtils
 ** ITestS3GuardTtl
 ** ITestS3AFailureHandling
 ** AbstractSTestS3AHugeFiles
 *** ITestS3AHugeFilesDiskBlocks
 **** ITestS3AHugeFilesSSECDiskBlocks
 *** ITestS3AHugeMagicCommits
 *** ITestS3AHugeFilesArrayBlocks
 *** ITestS3AHugeFilesByteBufferBlocks
 ** ITestS3ABlockOutputArray
 *** ITestS3ABlockOutputByteBuffer
 *** ITestS3ABlockOutputDisk
 ** AbstractCommitITest
 *** AbstractITCommitProtocol
 **** ITestStagingCommitProtocol
 ***** ITestDirectoryCommitProtocol
 ***** ITestPartitionedCommitProtocol
 **** ITestMagicCommitProtocol
 # Already ignored tests in the testsuite
 ** ITestS3AEncryptionAlgorithmValidation
 # Ignored tests by me due to another stuff
 ** ITestS3ATemporaryCredentials: Bug in test
 ** AbstractCommitITest: Bug: BouncyCastle not found exception
 *** AbstractITCommitMRJob
 **** ITStagingCommitMRJobBadDest
 **** ITMagicCommitMRJob
 **** ITDirectoryCommitMRJob
 **** ITStagingCommitMRJob
 **** ITPartitionCommitMRJob
 ** AbstractTestS3AEncryption: Auth tests were not set up

Apart from the ignored tests either by me or by default, the strange thing is 
that *tests have either failed or had explicitly disabled cache but not both*. 
In the list above you can see the list of tests, where the first category is 
where tests have failed but in the configuration the disabling cache option was 
not added (checked every level in the class hierarchy), and the second category 
where cache is explicitly disabled, but those tests have never failed during my 
examination.

It has taken some time looking through them, but I checked every one of the 
tests (still I might have made a mistake) and every config having (indirect) 
superclass AbstractS3ATestBase.

What's more interesting is thatĀ if I manually add a plusĀ 
{{S3ATestUtils.disableFilesystemCaching}} call to the {{createConfiguration}} 
method of AbstractS3ATestBase (so not to the xml) I get only a single exception 
instead of the dozens above: only in {{ITestS3ACommitterFactory}}.
I also add the output of the single failed test in that case:

{code:java}
2018-11-27 18:55:08,036 [JUnit-testEverything] ERROR 
s3a.S3ACloseEnforcedFileSystem 
(S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): 
java.lang.RuntimeException: Using closed FS!.
        at 
org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73)
        at 
org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.makeQualified(S3ACloseEnforcedFileSystem.java:115)
        at 
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.initOutput(AbstractS3ACommitter.java:149)
        at 
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.<init>(AbstractS3ACommitter.java:129)
        at 
org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitter.<init>(MagicS3GuardCommitter.java:73)
        at 
org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitterFactory.createTaskCommitter(MagicS3GuardCommitterFactory.java:44)
        at 
org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory.createTaskCommitter(S3ACommitterFactory.java:81)
        at 
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterFactory.createOutputCommitter(AbstractS3ACommitterFactory.java:48)
        at 
org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory.createCommitter(ITestS3ACommitterFactory.java:198)
        at 
org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory.assertFactoryCreatesExpectedCommitter(ITestS3ACommitterFactory.java:189)
        at 
org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory.testImplicitFileBinding(ITestS3ACommitterFactory.java:127)
        at 
org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory.testEverything(ITestS3ACommitterFactory.java:112)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
        at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
        at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.lang.Thread.run(Thread.java:748)
2018-11-27 18:55:08,036 [JUnit-testEverything] WARN  
s3a.S3ACloseEnforcedFileSystem (S3ACloseEnforcedFileSystem.java:logClosers(61)) 
- FS was closed 1 times by:
2018-11-27 18:55:08,036 [JUnit-testEverything] WARN  
s3a.S3ACloseEnforcedFileSystem (S3ACloseEnforcedFileSystem.java:logClosers(64)) 
- ----------------- close() #1 ---------------
2018-11-27 18:55:08,036 [JUnit-testEverything] WARN  
s3a.S3ACloseEnforcedFileSystem (S3ACloseEnforcedFileSystem.java:logClosers(65)) 
- java.lang.RuntimeException: close() called here
        at 
org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.close(S3ACloseEnforcedFileSystem.java:83)
        at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
        at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
        at 
org.apache.hadoop.fs.s3a.AbstractS3ATestBase.teardown(AbstractS3ATestBase.java:55)
        at 
org.apache.hadoop.fs.s3a.commit.AbstractCommitITest.teardown(AbstractCommitITest.java:208)
        at 
org.apache.hadoop.fs.s3a.commit.AbstractITCommitProtocol.teardown(AbstractITCommitProtocol.java:180)
        at 
org.apache.hadoop.fs.s3a.commit.magic.ITestMagicCommitProtocol.teardown(ITestMagicCommitProtocol.java:213)
        at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
        at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
        at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.lang.Thread.run(Thread.java:748)
{code}

Please also note that I didn't encounter the error messages similar to 
HADOOP-15796.

> S3A integration test failures: FileSystem is closed! - without parallel test 
> run
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-15819
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15819
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.1.1
>            Reporter: Gabor Bota
>            Assignee: Gabor Bota
>            Priority: Critical
>         Attachments: S3ACloseEnforcedFileSystem.java, 
> closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against 
> Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these 
> failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
>   Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] 
> testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
>   Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  
> Time elapsed: 0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] 
> testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  
> Time elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is 
> running on top of the other - so we should not see that the tests are failing 
> like this. The issue could be in how we handle 
> org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same 
> S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B 
> test will get the same FileSystem object from the cache and try to use it, 
> but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell 
> that the failed regression test result is an implementation issue in the 
> runtime code or a test implementation problem. 
> I've checked when and what closes the S3AFileSystem with a sightly modified 
> version of S3AFileSystem which logs the closers of the fs if an error should 
> occur. I'll attach this modified java file for reference. See the next 
> example of the result when it's running:
> {noformat}
> 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): 
> java.lang.RuntimeException: Using closed FS!.
>       at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73)
>       at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474)
>       at 
> org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338)
>       at 
> org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193)
>       at 
> org.apache.hadoop.fs.s3a.ITestS3AClosedFS.setup(ITestS3AClosedFS.java:40)
>       at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>       at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> 2018-10-04 00:52:25,596 [Thread-4201] WARN  s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:logClosers(61)) - FS was closed 6 times by:
> 2018-10-04 00:52:25,596 [Thread-4201] WARN  s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:logClosers(64)) - ----------------- close() 
> #1 ---------------
> 2018-10-04 00:52:25,596 [Thread-4201] WARN  s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:logClosers(65)) - 
> java.lang.RuntimeException: close() called here
>       at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.close(S3ACloseEnforcedFileSystem.java:83)
>       at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
>       at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
>       at 
> org.apache.hadoop.fs.s3a.AbstractS3ATestBase.teardown(AbstractS3ATestBase.java:55)
>       at 
> org.apache.hadoop.fs.s3a.commit.AbstractCommitITest.teardown(AbstractCommitITest.java:206)
>       at 
> org.apache.hadoop.fs.s3a.auth.ITestAssumedRoleCommitOperations.teardown(ITestAssumedRoleCommitOperations.java:90)
>       at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
>       at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> 2018-10-04 00:52:25,596 [Thread-4201] WARN  s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:logClosers(64)) - ----------------- close() 
> #2 ---------------
> 2018-10-04 00:52:25,596 [Thread-4201] WARN  s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:logClosers(65)) - 
> java.lang.RuntimeException: close() called here
>       at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.close(S3ACloseEnforcedFileSystem.java:83)
>       at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
>       at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
>       at 
> org.apache.hadoop.fs.s3a.AbstractS3ATestBase.teardown(AbstractS3ATestBase.java:55)
>       at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
>       at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> 2018-10-04 00:52:25,597 [Thread-4201] WARN  s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:logClosers(64)) - ----------------- close() 
> #3 ---------------
> 2018-10-04 00:52:25,598 [Thread-4201] WARN  s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:logClosers(65)) - 
> java.lang.RuntimeException: close() called here
>       at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.close(S3ACloseEnforcedFileSystem.java:83)
>       at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
>       at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
>       at 
> org.apache.hadoop.fs.s3a.AbstractS3ATestBase.teardown(AbstractS3ATestBase.java:55)
>       at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
>       at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> (...)
> {noformat}
> From the logs it seems that the closers are the teardown methods of the 
> abstract test classes, so the same FileSystem object gets reused between 
> tests.
> To run a test with the modified fs the following should be in the config:
> {noformat}
> <property>
>   <name>fs.s3a.impl</name>
>   <value>org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem</value>
>   <description>The implementation class of the S3A Filesystem</description>
> </property>
> {noformat}
> I file this jira to solve this issue, and start a conversation if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to