milleruntime opened a new issue #1919:
URL: https://github.com/apache/accumulo/issues/1919


   While running RW MultiTable jobs for 2.1.0-SNAPSHOT on Uno with 2 Tservers, 
I saw a few user initiated compactions run after a table was already being 
deleted and throw an error while trying to back out of the FATE compaction. 
Here is a trace of revelevant log activity in the Manager:
   <pre>
   2021-02-09T15:17:34,268 [tables.TableManager] DEBUG: Transitioning state for 
table 6w from ONLINE to DELETING
   2021-02-09T15:17:34,518 [delete.CleanUp] DEBUG: Still waiting for table to 
be deleted: 6w locationState: 
6w;1<@(null,ip-10-113-12-25:10000[10001119e1e0006],ip-10-113-12-25:10000[10001119e1e0006])
   2021-02-09T15:18:38,439 [accumulo.audit] INFO : operation: permitted; user: 
root; client: 127.0.0.1:33232; action: compactTable; targetTable: 6w; 
targetNamespace: +default;
   2021-02-09T15:18:40,295 [zookeeper.DistributedReadWriteLock] INFO : Added 
lock entry 22 userData 6b396a774d5b7118 lockTpye READ
   2021-02-09T15:18:40,311 [tableOps.Utils] INFO : namespace +default 
(6b396a774d5b7118) locked for read operation: COMPACT
   2021-02-09T15:18:40,313 [zookeeper.DistributedReadWriteLock] INFO : Added 
lock entry 1 userData 6b396a774d5b7118 lockTpye READ
   2021-02-09T15:18:43,910 [tableOps.Utils] INFO : namespace +default 
(6b396a774d5b7118) locked for read operation: COMPACT
   ...
   2021-02-09T15:19:02,911 [delete.CleanUp] DEBUG: Still waiting for table to 
be deleted: 6w locationState: 
6w;2;1@(null,ip-10-113-12-25:9997[10001119e1e0005],ip-10-113-12-25:9997[10001119e1e0005])
   
   2021-02-09T15:19:06,498 [tableOps.Utils] INFO : namespace +default 
(6b396a774d5b7118) locked for read operation: COMPACT
   2021-02-09T15:19:08,056 [delete.CleanUp] DEBUG: Deleted table 6w
   ...
   2021-02-09T15:19:10,094 [tableOps.Utils] INFO : namespace +default 
(6b396a774d5b7118) locked for read operation: COMPACT
   2021-02-09T15:19:10,137 [fate.Fate] INFO : Updated status for Repo with 
FATE[6b396a774d5b7118] to FAILED_IN_PROGRESS
   2021-02-09T15:19:13,032 [tableOps.Utils] INFO : namespace +default 
(6b396a774d5b7118) unlocked for read
   2021-02-09T15:19:13,037 [tableOps.Utils] INFO : table 6w (6b396a774d5b7118) 
unlocked for read
   2021-02-09T15:19:13,038 [fate.Fate] WARN : Failed to undo Repo, 
FATE[6b396a774d5b7118]
   org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
NoNode for /accumulo/4128397f-66ce-45f3-840f-38924fa0abd7/tables/6w/compact-id
           at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:118) 
~[zookeeper-3.6.2.jar:3.6.2]
           at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:54) 
~[zookeeper-3.6.2.jar:3.6.2]
           at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2358) 
~[zookeeper-3.6.2.jar:3.6.2]
           at 
org.apache.accumulo.fate.zookeeper.ZooReaderWriter.lambda$mutateExisting$6(ZooReaderWriter.java:187)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at 
org.apache.accumulo.fate.zookeeper.ZooReader.retryLoopMutator(ZooReader.java:165)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at 
org.apache.accumulo.fate.zookeeper.ZooReaderWriter.mutateExisting(ZooReaderWriter.java:185)
 ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at 
org.apache.accumulo.manager.tableOps.compact.CompactRange.removeIterators(CompactRange.java:152)
 ~[accumulo-manager-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at 
org.apache.accumulo.manager.tableOps.compact.CompactRange.undo(CompactRange.java:175)
 ~[accumulo-manager-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at 
org.apache.accumulo.manager.tableOps.compact.CompactRange.undo(CompactRange.java:47)
 ~[accumulo-manager-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at 
org.apache.accumulo.manager.tableOps.TraceRepo.undo(TraceRepo.java:64) 
~[accumulo-manager-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at 
org.apache.accumulo.fate.Fate$TransactionRunner.undo(Fate.java:203) 
~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at 
org.apache.accumulo.fate.Fate$TransactionRunner.processFailed(Fate.java:179) 
~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.fate.Fate$TransactionRunner.run(Fate.java:72) 
~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
           at java.lang.Thread.run(Thread.java:834) [?:?]
   </pre>
   
   From the logs, it appears the FATE transaction was started after the table 
was already marked for delete and made it through to the `CompactRange` 
operation. It looks like it was waiting there for the table write lock to free 
but failed once the Table delete was complete.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to