[
https://issues.apache.org/jira/browse/HBASE-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814875#comment-13814875
]
Liu Shaohui commented on HBASE-9873:
------------------------------------
[~stack] [~jeffreyz] [~liochon]
{quote}
3) Rely on the smallest of all biggest hfile's seqId of previous served regions
to ignore some entries. Facebook have implemented this in HBASE-6508 and we
backport it to hbase 0.94 in HBASE-9568.
{quote}
What about this? I think HBASE-6508 is useful.
Could any one help to review HBASE-9568(The backport of HBASE-6508 to 0.94) ?
We may backport HBASE-6508 to trunk later.
> Some improvements in hlog and hlog split
> ----------------------------------------
>
> Key: HBASE-9873
> URL: https://issues.apache.org/jira/browse/HBASE-9873
> Project: HBase
> Issue Type: Improvement
> Components: MTTR, wal
> Reporter: Liu Shaohui
> Priority: Critical
> Labels: failover, hlog
>
> Some improvements in hlog and hlog split
> 1) Try to clean old hlog after each memstore flush to avoid unnecessary hlogs
> split in failover. Now hlogs cleaning only be run in rolling hlog writer.
> 2) Add a background hlog compaction thread to compaction the hlog: remove the
> hlog entries whose data have been flushed to hfile. The scenario is that in a
> share cluster, write requests of a table may very little and periodical, a
> lots of hlogs can not be cleaned for entries of this table in those hlogs.
> 3) Rely on the smallest of all biggest hfile's seqId of previous served
> regions to ignore some entries. Facebook have implemented this in HBASE-6508
> and we backport it to hbase 0.94 in HBASE-9568.
> 4) Support running multiple hlog splitters on a single RS and on
> master(latter can boost split efficiency for tiny cluster)
> 5) Enable multiple splitters on 'big' hlog file by splitting(logically) hlog
> to slices(configurable size, eg hdfs trunk size 64M)
> support concurrent multiple split tasks on a single hlog file slice
> 6) Do not cancel the timeout split task until one task reports it succeeds
> (avoids scenario where split for a hlog file fails due to no one task can
> succeed within the timeout period ), and and reschedule a same split task to
> reduce split time ( to avoid some straggler in hlog split)
> 7) Consider the hlog data locality when schedule the hlog split task.
> Schedule the hlog to a splitter which is near to hlog data.
> 8) Support multi hlog writers and switching to another hlog writer when long
> write latency to current hlog due to possible temporary network spike?
> This is a draft which lists the improvements about hlog we try to implement
> in the near future. Comments and discussions are welcomed.
--
This message was sent by Atlassian JIRA
(v6.1#6144)