keith-turner commented on PR #5570: URL: https://github.com/apache/accumulo/pull/5570#issuecomment-2901573601
Adding a more concrete description of this bug and the fix. Before this fix the following could happen in a tablet server process. 1. THREAD_1 is working on loading a tablet that has an existing external compaction in the metadata table 2. THREAD_1 adds the external compaction id to CompactionManager.runningExternalCompactions 3. THREAD_2 is running CompactionManager.mainLoop or CompactionManager.commitExternalCompaction or CompactionManager.externalCompactionFailed 4. THREAD_2 sees an external compaction id in CompactionManager.runningExternalCompactions that no online tablet in the tserver knows about 5. THREAD_2 removes the external compaction id from CompactionManager.runningExternalCompactions 6. THREAD_1 adds the tablet it is working on to the set of online tablets. This is the set that THREAD_2 did not see the tablet in. When the above sequence of events happens the tablet server will always ignore RPCs from the coordinator to commit or fail the compaction. Until the tablet server is restarted and the race condition does not happen on the new tserer where tablet lands, the external compaction can never commit and its files stay reserved. This fix does two things to avoid the race condition. First in CompactionManager.mainLoop, it was modified to consider tablets that are opening and online. Tablets in the opening state will add existing external compactions to CompactionManager.runningExternalCompactions. Second the two RPC handling methods in CompactionManager that were removing entries from CompactionManager.runningExternalCompactions were modified to only do this if when the compaction id is in both sets. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org