DomGarguilo opened a new issue #1916:
URL: https://github.com/apache/accumulo/issues/1916


   **Describe the bug**
   While investigating a flaky test 
([SuspendedTabletsIT](https://github.com/apache/accumulo/pull/1888)), it was 
found that the garbage collector deleted a metadata file that was still needed 
for the test which caused it to fail once there was an attempt to read the 
metadata. This error was reported in [this 
comment](https://github.com/apache/accumulo/pull/1888#issuecomment-768674007). 
I cannot find anything that seems to suggest SuspendedTabletsIT directly caused 
this error, that is, it seems the error lies within the garbage collectors 
behavior. This error has only occurred once while running this test and I was 
unable to reproduce it.
   
   **Logs**
   
   _Originally posted by @ctubbsii in 
https://github.com/apache/accumulo/issues/1888#issuecomment-768674007_
   
   <details>
   
   ```java
   [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
605.616 s <<< FAILURE! - in org.apache.accumulo.test.master.SuspendedTabletsIT
   [ERROR] 
crashAndResumeTserver(org.apache.accumulo.test.master.SuspendedTabletsIT)  Time 
elapsed: 347.964 s  <<< FAILURE!
   ```
   
   
   ```java
   java.lang.AssertionError: Scanning of metadata failed, aborting
        at org.junit.Assert.fail(Assert.java:89)
        at 
org.apache.accumulo.test.master.SuspendedTabletsIT$TabletLocations.retrieve(SuspendedTabletsIT.java:306)
        at 
org.apache.accumulo.test.master.SuspendedTabletsIT.suspensionTestBody(SuspendedTabletsIT.java:208)
        at 
org.apache.accumulo.test.master.SuspendedTabletsIT.crashAndResumeTserver(SuspendedTabletsIT.java:101)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
        at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.lang.Thread.run(Thread.java:834)
   ```
   
   </details>
   
   <details>
   
   
   ```java
   2021-01-27T17:59:14,161 [gc.SimpleGarbageCollector] DEBUG: Deleting 
file:/home/christopher/git/apache/accumulo/accumulo/test/target/mini-tests/org.apache.accumulo.test.master.SuspendedTabletsIT_crashAndResumeTserver/accumulo/tables/
   ```
   
   ```java
   2021-01-27T18:00:10,691 [tserver.FileManager] ERROR: Failed to open file 
file:/home/christopher/git/apache/accumulo/accumulo/test/target/mini-tests/org.apache.accumulo.test.master.SuspendedTabletsIT_crashAndResumeTserver/accumulo/tables/!0/table_info/F0000038.rf
 java.io.FileNotFoundException: File 
file:/home/christopher/git/apache/accumulo/accumulo/test/target/mini-tests/org.apache.accumulo.test.master.SuspendedTabletsIT_crashAndResumeTserver/accumulo/tables/!0/table_info/F0000038.rf
 does not exist
   2021-01-27T18:00:10,693 [tserver.FileManager] ERROR: Failed to open file 
file:/home/christopher/git/apache/accumulo/accumulo/test/target/mini-tests/org.apache.accumulo.test.master.SuspendedTabletsIT_crashAndResumeTserver/accumulo/tables/!0/table_info/F0000038.rf
 java.io.FileNotFoundException: File 
file:/home/christopher/git/apache/accumulo/accumulo/test/target/mini-tests/org.apache.accumulo.test.master.SuspendedTabletsIT_crashAndResumeTserver/accumulo/tables/!0/table_info/F0000038.rf
 does not exist
   2021-01-27T18:00:10,693 [tserver.FileManager] ERROR: Failed to open file 
file:/home/christopher/git/apache/accumulo/accumulo/test/target/mini-tests/org.apache.accumulo.test.master.SuspendedTabletsIT_crashAndResumeTserver/accumulo/tables/!0/table_info/F0000038.rf
 java.io.FileNotFoundException: File 
file:/home/christopher/git/apache/accumulo/accumulo/test/target/mini-tests/org.apache.accumulo.test.master.SuspendedTabletsIT_crashAndResumeTserver/accumulo/tables/!0/table_info/F0000038.rf
 does not exist
   2021-01-27T18:00:10,694 [problems.ProblemReports] DEBUG: Filing problem 
report !0 FILE_READ 
file:/home/christopher/git/apache/accumulo/accumulo/test/target/mini-tests/org.apache.accumulo.test.master.SuspendedTabletsIT_crashAndResumeTserver/accumulo/tables/!0/table_info/F0000038.rf
   2021-01-27T18:00:10,694 [scan.LookupTask] WARN : lookup failed for tablet 
!0;~<                     
   java.io.IOException: Failed to open 
file:/home/christopher/git/apache/accumulo/accumulo/test/target/mini-tests/org.apache.accumulo.test.master.SuspendedTabletsIT_crashAndResumeTserver/accumulo/tables/!0/table_info/F0000038.rf
     at 
org.apache.accumulo.tserver.FileManager.reserveReaders(FileManager.java:331) 
~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:492)
 ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:501)
 ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.tserver.tablet.ScanDataSource.createIterator(ScanDataSource.java:164)
 ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.tserver.tablet.ScanDataSource.iterator(ScanDataSource.java:120)
 ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.iteratorsImpl.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:228)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at org.apache.accumulo.tserver.tablet.Tablet.lookup(Tablet.java:493) 
~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at org.apache.accumulo.tserver.tablet.Tablet.lookup(Tablet.java:646) 
~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at org.apache.accumulo.tserver.scan.LookupTask.run(LookupTask.java:117) 
[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.tserver.session.ScanSession$ScanMeasurer.run(ScanSession.java:54)
 [accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) 
[htrace-core-3.2.0-incubating.jar:3.2.0-incubating]
     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]          
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]          
     at java.lang.Thread.run(Thread.java:834) [?:?]                             
                       
   Caused by: java.io.UncheckedIOException: java.io.FileNotFoundException: File 
file:/home/christopher/git/apache/accumulo/accumulo/test/target/mini-tests/org.apache.accumulo.test.master.SuspendedTabletsIT_crashAndResumeTserver/accumulo/tables/!0/table_info/F0000038.rf
 does not exist
     at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$BCFileLoader.load(CachableBlockFile.java:227)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.blockfile.cache.lru.SynchronousLoadingBlockCache.getBlock(SynchronousLoadingBlockCache.java:127)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.blockfile.cache.lru.SynchronousLoadingBlockCache.resolveDependencies(SynchronousLoadingBlockCache.java:64)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.blockfile.cache.lru.SynchronousLoadingBlockCache.getBlock(SynchronousLoadingBlockCache.java:109)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:381)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:1164) 
~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:1256) 
~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.rfile.RFileOperations.getReader(RFileOperations.java:55)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:70)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.DispatchingFileFactory.openReader(DispatchingFileFactory.java:85)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.FileOperations$ReaderBuilder.build(FileOperations.java:449)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.tserver.FileManager.reserveReaders(FileManager.java:309) 
~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     ... 13 more                                                                
                       
   Caused by: java.io.FileNotFoundException: File 
file:/home/christopher/git/apache/accumulo/accumulo/test/target/mini-tests/org.apache.accumulo.test.master.SuspendedTabletsIT_crashAndResumeTserver/accumulo/tables/!0/table_info/F0000038.rf
 does not exist
     at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:668)
 ~[hadoop-client-api-3.3.0.jar:?]
     at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:989)
 ~[hadoop-client-api-3.3.0.jar:?]
     at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:658)
 ~[hadoop-client-api-3.3.0.jar:?]
     at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:460) 
~[hadoop-client-api-3.3.0.jar:?]
     at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:155)
 ~[hadoop-client-api-3.3.0.jar:?]
     at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:356) 
~[hadoop-client-api-3.3.0.jar:?]
     at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:945) 
~[hadoop-client-api-3.3.0.jar:?]     
     at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$CachableBuilder.lambda$fsPath$0(CachableBlockFile.java:92)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:167)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$BCFileLoader.load(CachableBlockFile.java:225)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.blockfile.cache.lru.SynchronousLoadingBlockCache.getBlock(SynchronousLoadingBlockCache.java:127)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.blockfile.cache.lru.SynchronousLoadingBlockCache.resolveDependencies(SynchronousLoadingBlockCache.java:64)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.blockfile.cache.lru.SynchronousLoadingBlockCache.getBlock(SynchronousLoadingBlockCache.java:109)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:381)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:1164) 
~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:1256) 
~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.rfile.RFileOperations.getReader(RFileOperations.java:55)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:70)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.DispatchingFileFactory.openReader(DispatchingFileFactory.java:85)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.core.file.FileOperations$ReaderBuilder.build(FileOperations.java:449)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     at 
org.apache.accumulo.tserver.FileManager.reserveReaders(FileManager.java:309) 
~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
     ... 13 more    
   ```
   
   </details>
   
   **Additional context**
   SuspendedTabletsIT sometimes hangs while running and times out. This error 
occurred in a run when the timeout was extended. The deletion of the metadata 
file came at minute 9 of the test which failed at 10 minutes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to