[jira] [Commented] (HBASE-9854) initial documentation for stripe compactions

Sergey Shelukhin (JIRA) Thu, 07 Nov 2013 17:01:58 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816863#comment-13816863
 ]


Sergey Shelukhin commented on HBASE-9854:
-----------------------------------------

Here's a preview... wdyt?

h3. Introduction
Stripe compactions is an experimental feature added in HBase 0.98 which aims to 
improve compactions for large regions or non-uniformly distributed row keys. In 
order to achieve smaller and/or more granular compactions, the store files 
within a region are maintained separately for several row-key sub-ranges, or 
"stripes", of the region. The division is not visible to the higher levels of 
the system, so externally each region functions as before.
This feature is fully compatible with default compactions - it can be enabled 
for existing tables, and the table will continue to operate normally if it's 
disabled later.
h3. When to use
You might want to consider using this feature if you have:
* large regions (in that case, you can get the positive effect of much smaller 
regions without additional memstore and region management overhead); or
* non-uniform row keys, e.g. time dimension in a key (in that case, only the 
stripes receiving the new keys will keep compacting - old data will not compact 
as much, or at all).

According to perf testing performed, in these case the read performance can 
improve somewhat, and the read and write performance variability due to 
compactions is greatly reduced. There's overall perf improvement on large, 
non-uniform row key regions (hash-prefixed timestamp key) over long term. All 
of these performance gains are best realized when table is already large. In 
future, the perf improvement might also extend to region splits.
h3. How to enable
To use stripe compactions for a table or a column family, you should set its  
{{hbase.hstore.engine.class}} to 
{{org.apache.hadoop.hbase.regionserver.StripeStoreEngine}}. Due to the nature 
of compactions, you also need to set the blocking file count to a high number 
(100 is a good default, which is 10 times the normal default of 10). If 
changing the existing table, you should do it when it is disabled. Examples:
{code}
alter 'orders_table', CONFIGURATION => {'hbase.hstore.engine.class' => 
'org.apache.hadoop.hbase.regionserver.StripeStoreEngine', 
'hbase.hstore.blockingStoreFiles' => '100'}

alter 'orders_table', {NAME => 'blobs_cf', CONFIGURATION => 
{'hbase.hstore.engine.class' => 
'org.apache.hadoop.hbase.regionserver.StripeStoreEngine', 
'hbase.hstore.blockingStoreFiles' => '100'}}

create 'orders_table', 'blobs_cf', CONFIGURATION => 
{'hbase.hstore.engine.class' => 
'org.apache.hadoop.hbase.regionserver.StripeStoreEngine', 
'hbase.hstore.blockingStoreFiles' => '100'}
{code}
Then, you can configure the other options if needed (see below) and enable the 
table.
To switch back to default compactions, set {{hbase.hstore.engine.class}} to nil 
to unset it; or set it explicitly to 
"{{org.apache.hadoop.hbase.regionserver.DefaultStoreEngine}}" (this also needs 
to be done on a disabled table).
When you enable a large table after changing the store engine either way, a 
major compaction will likely be performed on most regions. This is not a 
problem with new tables.
h3. How to configure
All of the settings described below are best set on table/cf level (with the 
table disabled first, for the settings to apply), similar to the above, e.g.
{code}
alter 'orders_table', CONFIGURATION => {'key' => 'value', ..., 'key' => 
'value'}}
{code}
h4. Region and stripe sizing
Based on your region sizing, you might want to also change your stripe sizing. 
By default, your new regions will start with one stripe. When the stripe is too 
big (16 memstore flushes size), on next compaction it will be split into two 
stripes. Stripe splitting will continue in a similar manner as the region 
grows, until the region itself is big enough to split (region split will work 
the same as with default compactions).
You can improve this pattern for your data. You should generally aim at stripe 
size of at least 1Gb, and about 8-12 stripes for uniform row keys - so, for 
example if your regions are 30 Gb, 12x2.5Gb stripes might be a good idea.
The settings are as follows:
||Setting||Notes||
|{{hbase.store.stripe.}}
{{initialStripeCount}}|Initial stripe count to create. You can use it as 
follows:
* for relatively uniform row keys, if you know the approximate target number of 
stripes from the above, you can avoid some splitting overhead by starting 
w/several stripes (2, 5, 10...). Note that if the early data is not 
representative of overall row key distribution, this will not be as efficient.
* for existing tables with lots of data, you can use this to pre-split stripes. 
* for e.g. hash-prefixed sequential keys, with more than one hash prefix per 
region, you know that some pre-splitting makes sense.|
|{{hbase.store.stripe.}}
{{sizeToSplit}}|Maximum stripe size before it's split. You can use this in 
conjunction with the next setting to control target stripe size (sizeToSplit = 
splitPartsCount * target stripe size), according to the above sizing 
considerations.|
|{{hbase.store.stripe.}}
{{splitPartCount}}|The number of new stripes to create when splitting one. The 
default is 2, and is good for most cases. For non-uniform row keys, you might 
experiment with increasing the number somewhat (3-4), to isolate the arriving 
updates into narrower slice of the region with just one split instead of 
several.|
h4. Memstore sizing
By default, the flush creates several files from one memstore, according to 
existing stripe boundaries and row keys to flush. This approach minimizes write 
amplification, but can be undesirable if memstore is small and there are many 
stripes (the files will be too small).
In such cases, you can set {{hbase.store.stripe.compaction.flushToL0}} to true. 
This will cause  flush to create a single file instead; when at least 
{{hbase.store.stripe.compaction.minFilesL0}} such files (by default, 4) 
accumulate, they will be compacted into striped files.
h4. Normal compaction configuration
All the settings that apply to normal compactions (file size limits, etc.) 
apply to stripe compactions. The exception are min and max number of files, 
which are set to higher values by default because the files in stripes are 
smaller. To control these for stripe compactions, use 
{{hbase.store.stripe.compaction.minFiles}} and {{.maxFiles}}. 

> initial documentation for stripe compactions
> --------------------------------------------
>
>                 Key: HBASE-9854
>                 URL: https://issues.apache.org/jira/browse/HBASE-9854
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Compaction
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> Initial documentation for stripe compactions (distill from attached docs, 
> make up to date, put somewhere like book)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9854) initial documentation for stripe compactions

Reply via email to