[
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912908#comment-16912908
]
Vladimir Rodionov commented on HBASE-22749:
-------------------------------------------
It is the big list [~busbey]. Below are some answers:
{quote}
region sizing - splitting, normalizers, etc
Need to expressly state wether or not this change to per-region accounting
plans to alter the current assumptions that use of the feature means that the
MOB data isn’t counted when determining region size for decisions to normalize
or split.
{quote}
This part has not been touched - meaning that MOB 2.0 does exactly the same
what MOB 1.0 does. If MOB is not counted for normalize/split decision now in
MOB it won'y be in 2.0. Should it? Probably, yes. But it is not part of
scalable compactions.
{quote}
write amplification
{quote}
Good question. Default (non partial) major compaction does have the same or
similar to regular HBase tiered compaction WA. I would not call this unbounded,
but it is probably worse than in MOB 1.0. Partial MOB compaction will
definetely have a bounded WA comparable to what we have in MOB 1.0 (where
compaction is done by partitions and partitions are date-based)
The idea of partial major MOB compaction is either to keep total number of MOB
files in a system under control (say - around 1 M), or do not compact MOB files
which reached some size threshold (say 1GB). The latter case is easier to
explain. If you exclude all MOB files above 1GB from compaction - your WA will
be bounded by log2(T/S), where log2 - logarithm base 2, T - maximum MOB file
size (threshold) and S - average size of Memstore flush. This is approximation
of course. How it compares to MOB 1.0 partitioned compaction? By varying T we
can get any WA we want. Say, if we set limit on number of MOB files to 10M we
can decrease T to 100MB and it will give us total capacity for MOB data to 1PB.
With 100MB threshold, WA can be very low (low one's). I will update the
document and will add more info on partial major MOB compactions, including
file selection policy.
> Distributed MOB compactions
> ----------------------------
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
> Issue Type: New Feature
> Components: mob
> Reporter: Vladimir Rodionov
> Assignee: Vladimir Rodionov
> Priority: Major
> Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf,
> HBase-MOB-2.0-v2.pdf
>
>
> There are several drawbacks in the original MOB 1.0 (Moderate Object
> Storage) implementation, which can limit the adoption of the MOB feature:
> # MOB compactions are executed in a Master as a chore, which limits
> scalability because all I/O goes through a single HBase Master server.
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible
> implementation, which is free of the above drawbacks and can be used as a
> drop in replacement in existing MOB deployments. So, these are design goals
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses.
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just
> software upgrade.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)