dtspence commented on PR #3677:
URL: https://github.com/apache/accumulo/pull/3677#issuecomment-1664516280

   @dlmarion We believe we found a code path that produces a test timeout. It 
appeared to be related to a minc thread exiting due to not being able to obtain 
a mapfile (i.e. volume chooser seeing invalid context). The code-path in the 
integration test saw an error within the `MinorCompactor` and logged a retrying 
minc operation (possibly leaving minc thread active).
   
   The calling path that causes the thread to end (i.e. context error 
w/volume-chooser) logged the tablets were unable to unload.
   
   The logs report the following (w/minc thread exit path) when attempting to 
shutdown:
   ```
   2023-08-03T16:03:32,503 [tserver.UnloadTabletHandler] INFO : Tablet unload 
for extent 1<< requested.
   2023-08-03T16:03:32,503 [tablet.Tablet] DEBUG: 1<< closeState OPEN 
tabletMemory.memoryReservedForMinC() true 
tabletMemory.getMemTable().getNumEntries() 0 updatingFlushID false
   2023-08-03T16:03:32,503 [tablet.Tablet] WARN : Unable to initiate minc for 
close on 1<<. Tablet might be closed or deleting.
   2023-08-03T16:03:32,503 [tserver.UnloadTabletHandler] ERROR: Failed to close 
tablet 1<<... Aborting migration
   java.lang.RuntimeException: Unable to initiate minc for close on 1<<. Tablet 
might be closed or deleting.
           at 
org.apache.accumulo.tserver.tablet.Tablet.initiateClose(Tablet.java:925) 
~[classes/:?]
           at org.apache.accumulo.tserver.tablet.Tablet.close(Tablet.java:898) 
~[classes/:?]
           at 
org.apache.accumulo.tserver.UnloadTabletHandler.run(UnloadTabletHandler.java:92)
 [classes/:?]
           at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
 [classes/:?]
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
           at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
 [classes/:?]
           at java.lang.Thread.run(Thread.java:829) [?:?]
   ```
   
   To replicate, the following is needed as base configuration:
   
   ```java
       @Override
       public void configureMiniCluster(MiniAccumuloConfigImpl cfg, 
Configuration coreSite) {
         cfg.setNumTservers(1);
         cfg.setProperty("general.volume.chooser", 
"org.apache.accumulo.core.spi.fs.DelegatingChooser");
         cfg.setProperty("general.custom.volume.chooser.default",
                 "org.apache.accumulo.core.spi.fs.PreferredVolumeChooser");
         cfg.setProperty("general.custom.volume.preferred.default", 
"file:/home/dtspen2/dev/git/dlmarion/accumulo/test/target/mini-tests/org.apache.accumulo.test.functional.HalfClosedTabletIT_SharedMiniClusterBase/accumulo");
       }
   ```
   
   Then a `flush()` operation which caused the minc thread to end:
   ```java
         tops.setProperty(tableName, 
Property.TABLE_CLASSLOADER_CONTEXT.getKey(), "invalid");
   
         tops.flush(tableName);
   
         Thread.sleep(500);
   
         // This should fail to split, but not leave the tablets in a state 
where they can't
         // be unloaded
         assertThrows(AccumuloServerException.class,
             () -> tops.addSplits(tableName, Sets.newTreeSet(List.of(new 
Text("b")))));
   
         tops.removeProperty(tableName, 
Property.TABLE_CLASSLOADER_CONTEXT.getKey());
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to