keith-turner opened a new issue, #3740:
URL: https://github.com/apache/accumulo/issues/3740

   **Describe the bug**
   When a major compaction finishes [this 
function](https://github.com/apache/accumulo/blob/b1b2557f949e9212a1b1ca9b65f2d66c01a69edb/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java#L451)
 does the following.  
   
    1. Updates the tablets in memory data structures
    2. Updates the tablets files in the metadata table
   
   Once step 1 is complete above, those files are available for a subsequent 
compaction.  That could possibly lead to the following race condition with 
concurrent compactions on the same tablet.
   
    1. Compaction 1 starts compacting files [F1,F2] into file C3
    2. Compaction 1 removes [F1,F2]  and adds C3 to the tablets in memory file 
set
    3. Compaction 2 starts compacting file [F0,C3] into file C4
    4. Compaction 2 removes [F0,C3]  and adds C4 to the tablets in memory file 
set
    5. Compaction 2 removes [F0,C3]  and adds C4 to the tablets row in the 
metadata table
    6. Compaction 1 removes [F1,F2]  and adds C3 to the tablets row in the 
metadata table
   
   The above race condition adds the file C3 back to the metadata table when it 
should not.  This could bring back deleted data and would cause the tablet to 
report memory and metadata as inconsistent. 
   
   This race condition probably only exists between concurrent major 
compactions.  For [bulk 
import](https://github.com/apache/accumulo/blob/b1b2557f949e9212a1b1ca9b65f2d66c01a69edb/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java#L214)
 and [minor 
compaction](https://github.com/apache/accumulo/blob/b1b2557f949e9212a1b1ca9b65f2d66c01a69edb/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java#L304)
 this race condition is probably not possible because those functions update 
the metadata table and then the in memory set.  Its updating the in memory set 
first that makes it available for compaction and leads to the possible race 
condition.
   
   This was likely not an issue in Accumulo 1.X because tablets could only have 
a single compaction running at a time.  With the introduction of concurrent 
tablet compactions in 2.x it seems like this race condition could happen, but 
not sure if there is something in the code outside of DataFileManager that 
accidentally prevents the race condition from happening.
   
   **Expected behavior**
   Race condition is not possible
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to