busbey commented on a change in pull request #1232: HBASE-23198 Update ref 
guide for distributed MOB compaction.
URL: https://github.com/apache/hbase/pull/1232#discussion_r387809615
 
 

 ##########
 File path: src/main/asciidoc/_chapters/hbase_mob.adoc
 ##########
 @@ -468,3 +585,54 @@ $ hdfs dfs -count /hbase/mobdir/data/default/some_table
 +
 This data is spurious and may be reclaimed. You should sideline it, verify 
your application’s view
 of the table, and then delete it.
+
+=== MOB Upgrade Considerations
+
+Generally, data stored using the MOB feature should transparently continue to 
work correctly across
+HBase upgrades.
+
+==== Upgrading to a version with the "distributed MOB compaction" feature
+
+Prior to the work in HBASE-22749, "Distributed MOB compactions", HBase had the 
Master coordinate all
+compaction maintenance of the MOB hfiles. Centralizing management of the MOB 
data allowed for space
+optimizations but safely coordinating that managemet with Region Servers 
resulted in edge cases that
+caused data loss (ref 
link:https://issues.apache.org/jira/browse/HBASE-22075[HBASE-22075]).
+
+Users of the MOB feature upgrading to a version of HBase that includes 
HBASE-22749 should be aware
+of the following changes:
+
+* The MOB system no longer allows setting "MOB Compaction Policies"
+* The MOB system no longer attempts to group MOB values by the date of the 
original cell's timestamp
+  according to said compaction policies, daily or otherwise
+* The MOB system no longer needs to track individual cell deletes through the 
use of special
+  files in the MOB storage area with the suffix `_del`. After upgrading you 
should sideline these
+  files.
+* Under default configuration the MOB system should take much less time to 
perform a compaction of
+  MOB stored values. This is a direct consequence of the fact that HBase will 
place a much larger
+  load on the underlying filesystem when doing compactions of MOB stored 
values; the additional load
+  should be a multiple on the order of magnitude of number of region servers. 
I.e. for a cluster
+  with three region servers and two masters the default configuration should 
have HBase put three
+  times the load on HDFS during major compactions that rewrite MOB data when 
compared to Master
+  handled MOB compaction; it should also be approximately three times as fast.
+* When the MOB system detects that a table has hfiles with references to MOB 
data but the reference
+  hfiles do not yet have the needed file level metadata (i.e. from use of the 
MOB feature prior to
+  HBASE-22749) then it will refuse to archive _any_ MOB hfiles from that 
table. The normal course of
+  periodic compactions done by Region Servers will update existing hfiles with 
MOB references, but
+  until a given table has been through the needed compactions operators should 
expect to see an
+  increased amount of storage used by the MOB feature.
+* Performing a compaction with type "MOB" no longer has special handling to 
compact specifically the
+  MOB hfiles. Instead it will issue a warning and do a major compaction of the 
table. Similarly,
+  manually performing a major compaction on a table or region will also handle 
compacting the MOB
+  stored values for that table or region respectively.
 
 Review comment:
   yes that's correct. There's no notion of compacting mob hfiles independent 
of compacting the hfiles that contain the references to those mob values. 
that's why we no longer have the dataloss issue from HBASE-22075.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to