dtspence commented on PR #3677:
URL: https://github.com/apache/accumulo/pull/3677#issuecomment-1664516280
@dlmarion We believe we found a code path that produces a test timeout. It
appeared to be related to a minc thread exiting due to not being able to obtain
a mapfile (i.e. volume chooser seeing invalid context). The code-path in the
integration test saw an error within the `MinorCompactor` and logged a retrying
minc operation (possibly leaving minc thread active).
The calling path that causes the thread to end (i.e. context error
w/volume-chooser) logged the tablets were unable to unload.
The logs report the following (w/minc thread exit path) when attempting to
shutdown:
```
2023-08-03T16:03:32,503 [tserver.UnloadTabletHandler] INFO : Tablet unload
for extent 1<< requested.
2023-08-03T16:03:32,503 [tablet.Tablet] DEBUG: 1<< closeState OPEN
tabletMemory.memoryReservedForMinC() true
tabletMemory.getMemTable().getNumEntries() 0 updatingFlushID false
2023-08-03T16:03:32,503 [tablet.Tablet] WARN : Unable to initiate minc for
close on 1<<. Tablet might be closed or deleting.
2023-08-03T16:03:32,503 [tserver.UnloadTabletHandler] ERROR: Failed to close
tablet 1<<... Aborting migration
java.lang.RuntimeException: Unable to initiate minc for close on 1<<. Tablet
might be closed or deleting.
at
org.apache.accumulo.tserver.tablet.Tablet.initiateClose(Tablet.java:925)
~[classes/:?]
at org.apache.accumulo.tserver.tablet.Tablet.close(Tablet.java:898)
~[classes/:?]
at
org.apache.accumulo.tserver.UnloadTabletHandler.run(UnloadTabletHandler.java:92)
[classes/:?]
at
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
[classes/:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]
at
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
[classes/:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
```
To replicate, the following is needed as base configuration:
```java
@Override
public void configureMiniCluster(MiniAccumuloConfigImpl cfg,
Configuration coreSite) {
cfg.setNumTservers(1);
cfg.setProperty("general.volume.chooser",
"org.apache.accumulo.core.spi.fs.DelegatingChooser");
cfg.setProperty("general.custom.volume.chooser.default",
"org.apache.accumulo.core.spi.fs.PreferredVolumeChooser");
cfg.setProperty("general.custom.volume.preferred.default",
"file:/home/dtspen2/dev/git/dlmarion/accumulo/test/target/mini-tests/org.apache.accumulo.test.functional.HalfClosedTabletIT_SharedMiniClusterBase/accumulo");
}
```
Then a `flush()` operation which caused the minc thread to end:
```java
tops.setProperty(tableName,
Property.TABLE_CLASSLOADER_CONTEXT.getKey(), "invalid");
tops.flush(tableName);
Thread.sleep(500);
// This should fail to split, but not leave the tablets in a state
where they can't
// be unloaded
assertThrows(AccumuloServerException.class,
() -> tops.addSplits(tableName, Sets.newTreeSet(List.of(new
Text("b")))));
tops.removeProperty(tableName,
Property.TABLE_CLASSLOADER_CONTEXT.getKey());
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]