Hello everyone, We have been making progress on the alternative way of tracking store files originally proposed by Duo in HBASE-26067.
To briefly summarize it for those not following it, this feature introduces an abstraction layer to track store files still used/needed by store engines, allowing for plugging different approaches of identifying store files required by the given store. The design doc describing it in more detail is available here <https://docs.google.com/document/d/16Nr1Fn3VaXuz1g1FTiME-bnGR3qVK5B-raXshOkDLcY/edit#heading=h.calrs3kn4d8s> . Our main goal within this feature is to avoid the need for using temp files and renames when creating new hfiles (whenever flushing, compacting, splitting/merging or snapshotting). This is made possible by the pluggable tracker implementation labeled "FILE". The current behavior using temp dirs and renames would still be the default approach (labeled "DEFAULT"). This "renameless" approach is appealing for deployments using Amazon S3 Object store file system, where the lack of atomic rename operations imposed the necessity of an additional layer of locking (HBOSS), which combined with the s3a rename operation can have a performance overhead. Some test runs on my employer infrastructure have shown promising results. A pure insertion ycsb run has shown ~6% performance gain on the client writes. Snapshot clone of hundreds of regions table completes in half of the time. There are also improvements in compaction, splits and merges times. Talking with Duo Zhang and Josh Elser in the HBASE-26067 jira, we feel optimistic that the current implementation is in a good state to get merged into master branch, but it would be nice to hear other opinions about it, before we effectively commit it. Looking forward to hearing some thoughts/concerns you might have. Kind regards, Wellington.