[jira] [Updated] (HBASE-22749) Distributed MOB compactions

Sean Busbey (Jira) Wed, 19 Feb 2020 15:26:16 -0800


     [ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Busbey updated HBASE-22749:
--------------------------------
    Fix Version/s: 3.0.0
     Release Note: 
<!-- markdown -->
MOB compaction is now handled in-line with per-region compaction on region
  servers

- regions with mob data store per-hfile metadata about which mob hfiles are
  referenced
- admin requested major compaction will also rewrite MOB files; periodic RS
  initiated major compaction will not
- periodically a chore in the master will initiate a major compaction that
  will rewrite MOB values to ensure it happens. controlled by
  'hbase.mob.compaction.chore.period'. default is weekly
- control how many RS the chore requests major compaction on in parallel
  with 'hbase.mob.major.compaction.region.batch.size'. default is as
  parallel as possible.
- periodic chore in master will scan backing hfiles from regions to get the
  set of referenced mob hfiles and archive those that are no longer
  referenced. control period with 'hbase.master.mob.cleaner.period'
- Optionally, RS that are compacting mob files can limit write
  amplification by not rewriting values from mob hfiles over a certain size
  limit. opt-in by setting 'hbase.mob.compaction.type' to 'optimized'.
  control threshold by 'hbase.mob.compactions.max.file.size'.
  default is 1GiB
- Should smoothly integrate with existing MOB users via rolling upgrade.
  will delay old MOB file cleanup until per-region compaction has managed
  to compact each region at least once so that used mob hfile metadata can
  be gathered.

This improvement obviates the dataloss in HBASE-22075.
       Resolution: Fixed
           Status: Resolved  (was: Patch Available)

> Distributed MOB compactions 
> ----------------------------
>
>                 Key: HBASE-22749
>                 URL: https://issues.apache.org/jira/browse/HBASE-22749
>             Project: HBase
>          Issue Type: New Feature
>          Components: mob
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>            Priority: Major
>             Fix For: 3.0.0
>
>         Attachments: HBASE-22749-branch-2.2-v4.patch, 
> HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, 
> HBASE-22749-master-v3.patch, HBASE-22749-master-v4.patch, 
> HBASE-22749_nightly_Unit_Test_Results.csv, 
> HBASE-22749_nightly_unit_test_analyzer.pdf, HBase-MOB-2.0-v3.0.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-22749) Distributed MOB compactions

Reply via email to