milleruntime opened a new issue #1919:
URL: https://github.com/apache/accumulo/issues/1919
While running RW MultiTable jobs for 2.1.0-SNAPSHOT on Uno with 2 Tservers,
I saw a few user initiated compactions run after a table was already being
deleted and throw an error while trying to back out of the FATE compaction.
Here is a trace of revelevant log activity in the Manager:
<pre>
2021-02-09T15:17:34,268 [tables.TableManager] DEBUG: Transitioning state for
table 6w from ONLINE to DELETING
2021-02-09T15:17:34,518 [delete.CleanUp] DEBUG: Still waiting for table to
be deleted: 6w locationState:
6w;1<@(null,ip-10-113-12-25:10000[10001119e1e0006],ip-10-113-12-25:10000[10001119e1e0006])
2021-02-09T15:18:38,439 [accumulo.audit] INFO : operation: permitted; user:
root; client: 127.0.0.1:33232; action: compactTable; targetTable: 6w;
targetNamespace: +default;
2021-02-09T15:18:40,295 [zookeeper.DistributedReadWriteLock] INFO : Added
lock entry 22 userData 6b396a774d5b7118 lockTpye READ
2021-02-09T15:18:40,311 [tableOps.Utils] INFO : namespace +default
(6b396a774d5b7118) locked for read operation: COMPACT
2021-02-09T15:18:40,313 [zookeeper.DistributedReadWriteLock] INFO : Added
lock entry 1 userData 6b396a774d5b7118 lockTpye READ
2021-02-09T15:18:43,910 [tableOps.Utils] INFO : namespace +default
(6b396a774d5b7118) locked for read operation: COMPACT
...
2021-02-09T15:19:02,911 [delete.CleanUp] DEBUG: Still waiting for table to
be deleted: 6w locationState:
6w;2;1@(null,ip-10-113-12-25:9997[10001119e1e0005],ip-10-113-12-25:9997[10001119e1e0005])
2021-02-09T15:19:06,498 [tableOps.Utils] INFO : namespace +default
(6b396a774d5b7118) locked for read operation: COMPACT
2021-02-09T15:19:08,056 [delete.CleanUp] DEBUG: Deleted table 6w
...
2021-02-09T15:19:10,094 [tableOps.Utils] INFO : namespace +default
(6b396a774d5b7118) locked for read operation: COMPACT
2021-02-09T15:19:10,137 [fate.Fate] INFO : Updated status for Repo with
FATE[6b396a774d5b7118] to FAILED_IN_PROGRESS
2021-02-09T15:19:13,032 [tableOps.Utils] INFO : namespace +default
(6b396a774d5b7118) unlocked for read
2021-02-09T15:19:13,037 [tableOps.Utils] INFO : table 6w (6b396a774d5b7118)
unlocked for read
2021-02-09T15:19:13,038 [fate.Fate] WARN : Failed to undo Repo,
FATE[6b396a774d5b7118]
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
NoNode for /accumulo/4128397f-66ce-45f3-840f-38924fa0abd7/tables/6w/compact-id
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
~[zookeeper-3.6.2.jar:3.6.2]
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
~[zookeeper-3.6.2.jar:3.6.2]
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2358)
~[zookeeper-3.6.2.jar:3.6.2]
at
org.apache.accumulo.fate.zookeeper.ZooReaderWriter.lambda$mutateExisting$6(ZooReaderWriter.java:187)
~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at
org.apache.accumulo.fate.zookeeper.ZooReader.retryLoopMutator(ZooReader.java:165)
~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at
org.apache.accumulo.fate.zookeeper.ZooReaderWriter.mutateExisting(ZooReaderWriter.java:185)
~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at
org.apache.accumulo.manager.tableOps.compact.CompactRange.removeIterators(CompactRange.java:152)
~[accumulo-manager-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at
org.apache.accumulo.manager.tableOps.compact.CompactRange.undo(CompactRange.java:175)
~[accumulo-manager-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at
org.apache.accumulo.manager.tableOps.compact.CompactRange.undo(CompactRange.java:47)
~[accumulo-manager-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at
org.apache.accumulo.manager.tableOps.TraceRepo.undo(TraceRepo.java:64)
~[accumulo-manager-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at
org.apache.accumulo.fate.Fate$TransactionRunner.undo(Fate.java:203)
~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at
org.apache.accumulo.fate.Fate$TransactionRunner.processFailed(Fate.java:179)
~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.accumulo.fate.Fate$TransactionRunner.run(Fate.java:72)
~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
~[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
</pre>
From the logs, it appears the FATE transaction was started after the table
was already marked for delete and made it through to the `CompactRange`
operation. It looks like it was waiting there for the table write lock to free
but failed once the Table delete was complete.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]