keith-turner opened a new issue, #3740: URL: https://github.com/apache/accumulo/issues/3740
**Describe the bug** When a major compaction finishes [this function](https://github.com/apache/accumulo/blob/b1b2557f949e9212a1b1ca9b65f2d66c01a69edb/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java#L451) does the following. 1. Updates the tablets in memory data structures 2. Updates the tablets files in the metadata table Once step 1 is complete above, those files are available for a subsequent compaction. That could possibly lead to the following race condition with concurrent compactions on the same tablet. 1. Compaction 1 starts compacting files [F1,F2] into file C3 2. Compaction 1 removes [F1,F2] and adds C3 to the tablets in memory file set 3. Compaction 2 starts compacting file [F0,C3] into file C4 4. Compaction 2 removes [F0,C3] and adds C4 to the tablets in memory file set 5. Compaction 2 removes [F0,C3] and adds C4 to the tablets row in the metadata table 6. Compaction 1 removes [F1,F2] and adds C3 to the tablets row in the metadata table The above race condition adds the file C3 back to the metadata table when it should not. This could bring back deleted data and would cause the tablet to report memory and metadata as inconsistent. This race condition probably only exists between concurrent major compactions. For [bulk import](https://github.com/apache/accumulo/blob/b1b2557f949e9212a1b1ca9b65f2d66c01a69edb/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java#L214) and [minor compaction](https://github.com/apache/accumulo/blob/b1b2557f949e9212a1b1ca9b65f2d66c01a69edb/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java#L304) this race condition is probably not possible because those functions update the metadata table and then the in memory set. Its updating the in memory set first that makes it available for compaction and leads to the possible race condition. This was likely not an issue in Accumulo 1.X because tablets could only have a single compaction running at a time. With the introduction of concurrent tablet compactions in 2.x it seems like this race condition could happen, but not sure if there is something in the code outside of DataFileManager that accidentally prevents the race condition from happening. **Expected behavior** Race condition is not possible -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
