ddanielr opened a new issue, #3802: URL: https://github.com/apache/accumulo/issues/3802
**Describe the bug** When bringing a major compaction online, `replaceDatafiles` creates `GcCandidates` first, and then attempts to perform a second mutation to delete files and add scan references. https://github.com/apache/accumulo/blob/51124ad34872c2b659b69251fcd1160e20bbdd44/server/base/src/main/java/org/apache/accumulo/server/util/ManagerMetadataUtil.java#L183-L206 This is to ensure that the gcCandidates exist prior to an unchecked exception or possible process death when writing the tablet mutation. When the `gc.remove.in.use.candidates` property is enabled, there is a possibility that the GC starts a run between lines 183 & 206 while a major compaction is running and has not completed the second mutation to update tablet file and scan references. If this happens, then there is a possibility that the new gcCandidates could be considered `INUSE` and deleted before the tablet mutation is completed. This could result in a major compaction "leaking" files in HDFS that no longer have file or scan references associated with them. **Versions (OS, Maven, Java, and others, as appropriate):** - Affected version(s) of this project: [e.g. 1.10.0]: branch 2.1 with the `gc.remove.in.use.candidates` property enabled. **Current Mitigation** Only run the GC with the `gc.remove.in.use.candidates` property enabled as a cleanup operation when no major compactions are running. This would ensure that the `INUSE` candidates are not in a transient state. **Additional context** This is only an issue if the experimental `gc.remove.in.use.candidates` property is enabled and major compactions are running at the same time as a Garbage Collection cycle. Even then, the system would most likely have to be suffering metadata write performance issues and/or tserver process death. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
