[
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175677#comment-15175677
]
Clara Xiong commented on HBASE-15181:
-------------------------------------
Can this be added as release note?
Date tiered compaction policy is a date-aware store file layout that is
beneficial for time-range scans for time-series data.
When it performs well:
- reads for limited time ranges, especially scans of recent data
When it doesn't perform as well:
- random gets without a time range
- frequent deletes and updates
- out of order data writes, especially writes with timestamps in the future
- bulk loads of historical data
Recommended configuration:
To turn on Date Tiered Compaction:
hbase.hstore.compaction.compaction.policy:
org.apache.hadoop.hbase.regionserver.compactions.DateTieredCompactionPolicy
Parameters for Date Tiered Compaction:
hbase.hstore.compaction.date.tiered.max.storefile.age.millis: Files with
max-timestamp smaller than this will no longer be compacted.Default at
Long.MAX_VALUE.
hbase.hstore.compaction.date.tiered.base.window.millis: base window size in
milliseconds. Default at 6 hours.
hbase.hstore.compaction.date.tiered.windows.per.tier: number of windows per
tier. Default at 4.
hbase.hstore.compaction.date.tiered.incoming.window.min: minimal number of
files to compact in the incoming window. Set it to expected number of files in
the window to avoid wasteful compaction. Default at 6.
hbase.hstore.compaction.date.tiered.window.policy.class: the policy to select
store files within the same time window. It doesn’t apply to the incoming
window. Default at exploring compaction. This is to avoid wasteful compaction.
With tiered compaction all servers in the cluster will promote windows to
higher tier at the same time, so using a compaction throttle is recommended:
hbase.regionserver.throughput.controller:org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController
Because there will most likely be more store files around, we need to adjust
the configuration so that flush won't be blocked and compaction will be
properly throttled:
hbase.hstore.blockingStoreFiles: change to 50 if using all default parameters
when turning on date tiered compaction. Use 1.5~2 x projected file count if
changing the parameters, Projected file count = windows per tier x tier count +
incoming window min + files older than max age
For more details, please refer to the design spec at
https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit#
> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
> Issue Type: New Feature
> Components: Compaction
> Reporter: Clara Xiong
> Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch,
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch,
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch,
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch,
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent
> data.
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or
> per-column-famly level by hbase shell.
> Design spec is at
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)