busbey commented on a change in pull request #1232: HBASE-23198 Update ref guide for distributed MOB compaction. URL: https://github.com/apache/hbase/pull/1232#discussion_r387271665
########## File path: src/main/asciidoc/_chapters/hbase_mob.adoc ########## @@ -181,84 +320,51 @@ suit your environment, and restart or rolling restart the RegionServer. ---- ==== -=== MOB Optimization Tasks - ==== Manually Compacting MOB Files To manually compact MOB files, rather than waiting for the -<<mob.cache.configure,configuration>> to trigger compaction, use the -`compact` or `major_compact` HBase shell commands. These commands +periodic chore to trigger compaction, use the +`major_compact` HBase shell commands. These commands require the first argument to be the table name, and take a column -family as the second argument. and take a compaction type as the third argument. +family as the second argument. If used with a column family that includes MOB data, then +these operator requests will result in the MOB data being compacted. ---- -hbase> compact 't1', 'c1’, ‘MOB’ -hbase> major_compact 't1', 'c1’, ‘MOB’ +hbase> major_compact 't1' +hbase> major_compact 't2', 'c1’ ---- -These commands are also available via `Admin.compact` and -`Admin.majorCompact` methods. - -=== MOB architecture - -This section is derived from information found in -link:https://issues.apache.org/jira/browse/HBASE-11339[HBASE-11339]. For more information see -the attachment on that issue -"link:https://issues.apache.org/jira/secure/attachment/12724468/HBase%20MOB%20Design-v5.pdf[Base MOB Design-v5.pdf]". - -==== Overview -The MOB feature reduces the overall IO load for configured column families by storing values that -are larger than the configured threshold outside of the normal regions to avoid splits, merges, and -most importantly normal compactions. - -When a cell is first written to a region it is stored in the WAL and memstore regardless of value -size. When memstores from a column family configured to use MOB are eventually flushed two hfiles -are written simultaneously. Cells with a value smaller than the threshold size are written to a -normal region hfile. Cells with a value larger than the threshold are written into a special MOB -hfile and also have a MOB reference cell written into the normal region HFile. - -MOB reference cells have the same key as the cell they are based on. The value of the reference cell -is made up of two pieces of metadata: the size of the actual value and the MOB hfile that contains -the original cell. In addition to any tags originally written to HBase, the reference cell prepends -two additional tags. The first is a marker tag that says the cell is a MOB reference. This can be -used later to scan specifically just for reference cells. The second stores the namespace and table -at the time the MOB hfile is written out. This tag is used to optimize how the MOB system finds -the underlying value in MOB hfiles after a series of HBase snapshot operations (ref HBASE-12332). -Note that tags are only available within HBase servers and by default are not sent over RPCs. +This same request can be made via the `Admin.majorCompact` Java API. -All MOB hfiles for a given table are managed within a logical region that does not directly serve -requests. When these MOB hfiles are created from a flush or MOB compaction they are placed in a -dedicated mob data area under the hbase root directory specific to the namespace, table, mob -logical region, and column family. In general that means a path structured like: +=== MOB Troubleshooting ----- -%HBase Root Dir%/mobdir/data/%namespace%/%table%/%logical region%/%column family%/ ----- +==== Adjusting the MOB cleaner's tolerance for new hfiles -With default configs, an example table named 'some_table' in the -default namespace with a MOB enabled column family named 'foo' this HDFS directory would be +The MOB cleaner chore ignores all MOB hfiles that were created more recently than an hour prior to +the start of the shore to ensure we don't miss the reference metadata from teh corresponding regular +hfile. Without this safety check it would be possible for the cleaner chore to see a MOB hfile for +an in progress flush or compaction and prematurely archive the MOB data. This default buffer should +be sufficient for normal use. ----- -/hbase/mobdir/data/default/some_table/372c1b27e3dc0b56c3a031926e5efbe9/foo/ ----- - -These MOB hfiles are maintained by special chores in the HBase Master rather than by any individual -Region Server. Specifically those chores take care of enforcing TTLs and compacting them. Note that -this compaction is primarily a matter of controlling the total number of files in HDFS because our -operational assumptions for MOB data is that it will seldom update or delete. +You will need to adjust the tolerance if it takes longer than an hour for the two HDFS move Review comment: ah. reading this paragraph against it's not clear that's what I'm talking about. let me try an update. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
