busbey commented on a change in pull request #1232: HBASE-23198 Update ref guide for distributed MOB compaction. URL: https://github.com/apache/hbase/pull/1232#discussion_r387809615
########## File path: src/main/asciidoc/_chapters/hbase_mob.adoc ########## @@ -468,3 +585,54 @@ $ hdfs dfs -count /hbase/mobdir/data/default/some_table + This data is spurious and may be reclaimed. You should sideline it, verify your application’s view of the table, and then delete it. + +=== MOB Upgrade Considerations + +Generally, data stored using the MOB feature should transparently continue to work correctly across +HBase upgrades. + +==== Upgrading to a version with the "distributed MOB compaction" feature + +Prior to the work in HBASE-22749, "Distributed MOB compactions", HBase had the Master coordinate all +compaction maintenance of the MOB hfiles. Centralizing management of the MOB data allowed for space +optimizations but safely coordinating that managemet with Region Servers resulted in edge cases that +caused data loss (ref link:https://issues.apache.org/jira/browse/HBASE-22075[HBASE-22075]). + +Users of the MOB feature upgrading to a version of HBase that includes HBASE-22749 should be aware +of the following changes: + +* The MOB system no longer allows setting "MOB Compaction Policies" +* The MOB system no longer attempts to group MOB values by the date of the original cell's timestamp + according to said compaction policies, daily or otherwise +* The MOB system no longer needs to track individual cell deletes through the use of special + files in the MOB storage area with the suffix `_del`. After upgrading you should sideline these + files. +* Under default configuration the MOB system should take much less time to perform a compaction of + MOB stored values. This is a direct consequence of the fact that HBase will place a much larger + load on the underlying filesystem when doing compactions of MOB stored values; the additional load + should be a multiple on the order of magnitude of number of region servers. I.e. for a cluster + with three region servers and two masters the default configuration should have HBase put three + times the load on HDFS during major compactions that rewrite MOB data when compared to Master + handled MOB compaction; it should also be approximately three times as fast. +* When the MOB system detects that a table has hfiles with references to MOB data but the reference + hfiles do not yet have the needed file level metadata (i.e. from use of the MOB feature prior to + HBASE-22749) then it will refuse to archive _any_ MOB hfiles from that table. The normal course of + periodic compactions done by Region Servers will update existing hfiles with MOB references, but + until a given table has been through the needed compactions operators should expect to see an + increased amount of storage used by the MOB feature. +* Performing a compaction with type "MOB" no longer has special handling to compact specifically the + MOB hfiles. Instead it will issue a warning and do a major compaction of the table. Similarly, + manually performing a major compaction on a table or region will also handle compacting the MOB + stored values for that table or region respectively. Review comment: yes that's correct. There's no notion of compacting mob hfiles independent of compacting the hfiles that contain the references to those mob values. that's why we no longer have the dataloss issue from HBASE-22075. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
