[
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102218#comment-14102218
]
Jonathan Hsieh commented on HBASE-11339:
----------------------------------------
[~jiajia], thanks for the update to the user guide. I think it has the key
details points (the whats) needed for a user who already understands what a MOB
is and is for. We should add some context for users (the why's and the bigger
picture) that aren't familiar with it thought but adding some background into
this user doc. We'll eventually fold into the ref guide here[1].
Let me provide a quick draft that we could build off of.
Before Bullet we should have some info (this is a paraphrased version of the
design doc's intro.
{quote}
Data comes in many sizes, and it is convenient to save the binary data like
images, documents into the HBase. While HBase can handle binary objects with
cells that are 1 byte to 10MB long, HBase's normal read and write paths are
optimized for values smaller than 100KB in size. When HBase deals with large
numbers of values > 100kb and up to ~10MB of data, it encounters performance
degradations due to write amplification caused by splits and compactions.
HBase 2.0+ has added support for better managing large numbers of *Medium
Objects* (MOBs) that maintains the same high performance, strongly
consistently characteristics with low operational overhead.
To enable the feature, one must enable and config the mob components in each
region server and enable the mob feature on particular column families during
table creation or table alter. Also in the preview version of this feature,
the admin must setup periodic processes that re-optimizes the layout of mob
data.
Section: Enabling and Configuring the mob feature on region servers.
Need to enable feature in flushes and compactions. Tuning settings on caches.
user doc bullet 1. edit hbase-site...
user doc bullet 7. mob cache
Would be nice to have an examples of doing this from the shell -- an example of
creating a table with mob on a cf, and an example of a table alter that changes
a cf to use the mob path.
Section: Mob management
The mob feature introduces a new read and write path to hbase and in its
current incarnation requires external tools for housekeeping and
reoptimization. There are two tools introduced -- the expiredMobFileCleaner
for handling ttls and time based expiry of data, and the sweep tool for
coalescing small mob files or mob files with many deletions or updates.
user doc bullet 8.
Section: Enabling the mob feature on user tables
This can be done when creating a table or when altering a table
user doc bullet 2 (set cf with mob)
user doc bullet 6 (threshold size)
To a client, mob cells act just like normal cells.
user doc bullet 3 put
user doc bullet 4 scan
There is a special scanner mode users can use to read the raw values
user doc bullet 5.
{quote}
[1] http://hbase.apache.org/book.html
> HBase MOB
> ---------
>
> Key: HBASE-11339
> URL: https://issues.apache.org/jira/browse/HBASE-11339
> Project: HBase
> Issue Type: Umbrella
> Components: regionserver, Scanners
> Reporter: Jingcheng Du
> Assignee: Jingcheng Du
> Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase
> MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide .docx,
> hbase-11339-in-dev.patch
>
>
> It's quite useful to save the medium binary data like images, documents
> into Apache HBase. Unfortunately directly saving the binary MOB(medium
> object) to HBase leads to a worse performance since the frequent split and
> compaction.
> In this design, the MOB data are stored in an more efficient way, which
> keeps a high write/read performance and guarantees the data consistency in
> Apache HBase.
--
This message was sent by Atlassian JIRA
(v6.2#6252)