[
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Busbey updated HBASE-22749:
--------------------------------
Fix Version/s: 3.0.0
Release Note:
<!-- markdown -->
MOB compaction is now handled in-line with per-region compaction on region
servers
- regions with mob data store per-hfile metadata about which mob hfiles are
referenced
- admin requested major compaction will also rewrite MOB files; periodic RS
initiated major compaction will not
- periodically a chore in the master will initiate a major compaction that
will rewrite MOB values to ensure it happens. controlled by
'hbase.mob.compaction.chore.period'. default is weekly
- control how many RS the chore requests major compaction on in parallel
with 'hbase.mob.major.compaction.region.batch.size'. default is as
parallel as possible.
- periodic chore in master will scan backing hfiles from regions to get the
set of referenced mob hfiles and archive those that are no longer
referenced. control period with 'hbase.master.mob.cleaner.period'
- Optionally, RS that are compacting mob files can limit write
amplification by not rewriting values from mob hfiles over a certain size
limit. opt-in by setting 'hbase.mob.compaction.type' to 'optimized'.
control threshold by 'hbase.mob.compactions.max.file.size'.
default is 1GiB
- Should smoothly integrate with existing MOB users via rolling upgrade.
will delay old MOB file cleanup until per-region compaction has managed
to compact each region at least once so that used mob hfile metadata can
be gathered.
This improvement obviates the dataloss in HBASE-22075.
Resolution: Fixed
Status: Resolved (was: Patch Available)
> Distributed MOB compactions
> ----------------------------
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
> Issue Type: New Feature
> Components: mob
> Reporter: Vladimir Rodionov
> Assignee: Vladimir Rodionov
> Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-22749-branch-2.2-v4.patch,
> HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch,
> HBASE-22749-master-v3.patch, HBASE-22749-master-v4.patch,
> HBASE-22749_nightly_Unit_Test_Results.csv,
> HBASE-22749_nightly_unit_test_analyzer.pdf, HBase-MOB-2.0-v3.0.pdf
>
>
> There are several drawbacks in the original MOB 1.0 (Moderate Object
> Storage) implementation, which can limit the adoption of the MOB feature:
> # MOB compactions are executed in a Master as a chore, which limits
> scalability because all I/O goes through a single HBase Master server.
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible
> implementation, which is free of the above drawbacks and can be used as a
> drop in replacement in existing MOB deployments. So, these are design goals
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses.
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just
> software upgrade.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)