Hello everyone

I am a developer from the IoTDB and Ratis communities, and I am familiar with 
distributed systems and storage engines. Recently, I have been studying the 
MOBV2 feature in HBase.

I found that when hbase.mob.compaction.type is set to optimized, it is possible 
for multiple files, each not exceeding a specific threshold, to be generated in 
a single compaction. However, I also noticed that each time the memstore is 
flushed, it can generate a new mob hfile, and since the default flush threshold 
for each memstore is 128MB, many small MOB files are created. Given that the 
default merge period for mob files is one week, does this mean that these newly 
generated small MOB files have to wait a week before they can be merged into a 
larger file? I am not sure if my code interpretation is correct, so is this 
reasoning accurate?

If this is the case, I am curious as to why large files in the mob region 
aren't reused across different flushes and switched after reaching a certain 
size. This approach doesn’t seem to have any downsides, but it could reduce 
write amplification. Single-node storage engines like Badger/Titan operate this 
way; otherwise, the merging of these small mob HFiles would still cause write 
amplification. Was there any specific consideration during the design that led 
to this approach?

Additionally, I would like to understand the current state of the MOB feature 
and whether it has reached a production-ready level.

Thank you!

Best
------------------
Xinyu Tan

Reply via email to