[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912908#comment-16912908 ]
Vladimir Rodionov commented on HBASE-22749: ------------------------------------------- It is the big list [~busbey]. Below are some answers: {quote} region sizing - splitting, normalizers, etc Need to expressly state wether or not this change to per-region accounting plans to alter the current assumptions that use of the feature means that the MOB data isn’t counted when determining region size for decisions to normalize or split. {quote} This part has not been touched - meaning that MOB 2.0 does exactly the same what MOB 1.0 does. If MOB is not counted for normalize/split decision now in MOB it won'y be in 2.0. Should it? Probably, yes. But it is not part of scalable compactions. {quote} write amplification {quote} Good question. Default (non partial) major compaction does have the same or similar to regular HBase tiered compaction WA. I would not call this unbounded, but it is probably worse than in MOB 1.0. Partial MOB compaction will definetely have a bounded WA comparable to what we have in MOB 1.0 (where compaction is done by partitions and partitions are date-based) The idea of partial major MOB compaction is either to keep total number of MOB files in a system under control (say - around 1 M), or do not compact MOB files which reached some size threshold (say 1GB). The latter case is easier to explain. If you exclude all MOB files above 1GB from compaction - your WA will be bounded by log2(T/S), where log2 - logarithm base 2, T - maximum MOB file size (threshold) and S - average size of Memstore flush. This is approximation of course. How it compares to MOB 1.0 partitioned compaction? By varying T we can get any WA we want. Say, if we set limit on number of MOB files to 10M we can decrease T to 100MB and it will give us total capacity for MOB data to 1PB. With 100MB threshold, WA can be very low (low one's). I will update the document and will add more info on partial major MOB compactions, including file selection policy. > Distributed MOB compactions > ---------------------------- > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob > Reporter: Vladimir Rodionov > Assignee: Vladimir Rodionov > Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003)