Hello everyone I am a developer from the IoTDB and Ratis communities, and I am familiar with distributed systems and storage engines. Recently, I have been studying the MOBV2 feature in HBase.
I found that when hbase.mob.compaction.type is set to optimized, it is possible for multiple files, each not exceeding a specific threshold, to be generated in a single compaction. However, I also noticed that each time the memstore is flushed, it can generate a new mob hfile, and since the default flush threshold for each memstore is 128MB, many small MOB files are created. Given that the default merge period for mob files is one week, does this mean that these newly generated small MOB files have to wait a week before they can be merged into a larger file? I am not sure if my code interpretation is correct, so is this reasoning accurate? If this is the case, I am curious as to why large files in the mob region aren't reused across different flushes and switched after reaching a certain size. This approach doesn’t seem to have any downsides, but it could reduce write amplification. Single-node storage engines like Badger/Titan operate this way; otherwise, the merging of these small mob HFiles would still cause write amplification. Was there any specific consideration during the design that led to this approach? Additionally, I would like to understand the current state of the MOB feature and whether it has reached a production-ready level. Thank you! Best ------------------ Xinyu Tan