keith-turner commented on PR #5570:
URL: https://github.com/apache/accumulo/pull/5570#issuecomment-2901573601

   Adding a more concrete description of this bug and the fix.
   
   Before this fix the following could happen in a tablet server process.
   
    1. THREAD_1 is working on loading a tablet that has an existing external 
compaction in the metadata table
    2. THREAD_1 adds the external compaction id to 
CompactionManager.runningExternalCompactions
    3. THREAD_2 is running CompactionManager.mainLoop or 
CompactionManager.commitExternalCompaction or 
CompactionManager.externalCompactionFailed
    4. THREAD_2 sees an external compaction id in 
CompactionManager.runningExternalCompactions that no online tablet in the 
tserver knows about
    5. THREAD_2 removes the external compaction id from 
CompactionManager.runningExternalCompactions
    6. THREAD_1 adds the tablet it is working on to the set of online tablets.  
This is the set that THREAD_2 did not see the tablet in.
   
   When the above sequence of events happens the tablet server will always 
ignore RPCs from the coordinator to commit or fail the compaction.  Until the 
tablet server is restarted and the race condition does not happen on the new 
tserer where tablet lands, the external compaction can never commit and its 
files stay reserved.
   
   This fix does two things to avoid the race condition. First in 
CompactionManager.mainLoop, it was modified to consider tablets that are 
opening and online.  Tablets in the opening state will add existing external 
compactions to CompactionManager.runningExternalCompactions.  Second the two 
RPC handling methods in CompactionManager that were removing entries from 
CompactionManager.runningExternalCompactions were modified to only do this if 
when the compaction id is in both sets.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to