[jira] [Commented] (HBASE-9873) Some improvements in hlog and hlog split

Liu Shaohui (JIRA) Wed, 06 Nov 2013 05:33:06 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814875#comment-13814875
 ]


Liu Shaohui commented on HBASE-9873:
------------------------------------

[~stack] [~jeffreyz] [~liochon]
{quote}
3) Rely on the smallest of all biggest hfile's seqId of previous served regions 
to ignore some entries. Facebook have implemented this in HBASE-6508 and we 
backport it to hbase 0.94 in HBASE-9568.
{quote}
What about this? I think HBASE-6508 is useful. 
Could any one help to review HBASE-9568(The backport of HBASE-6508 to 0.94) ? 
We may backport HBASE-6508 to trunk later.

> Some improvements in hlog and hlog split
> ----------------------------------------
>
>                 Key: HBASE-9873
>                 URL: https://issues.apache.org/jira/browse/HBASE-9873
>             Project: HBase
>          Issue Type: Improvement
>          Components: MTTR, wal
>            Reporter: Liu Shaohui
>            Priority: Critical
>              Labels: failover, hlog
>
> Some improvements in hlog and hlog split
> 1) Try to clean old hlog after each memstore flush to avoid unnecessary hlogs 
> split in failover.  Now hlogs cleaning only be run in rolling hlog writer. 
> 2) Add a background hlog compaction thread to compaction the hlog: remove the 
> hlog entries whose data have been flushed to hfile. The scenario is that in a 
> share cluster, write requests of a table may very little and periodical,  a 
> lots of hlogs can not be cleaned for entries of this table in those hlogs.
> 3) Rely on the smallest of all biggest hfile's seqId of previous served 
> regions to ignore some entries.  Facebook have implemented this in HBASE-6508 
> and we backport it to hbase 0.94 in HBASE-9568.
> 4) Support running multiple hlog splitters on a single RS and on 
> master(latter can boost split efficiency for tiny cluster)
> 5) Enable multiple splitters on 'big' hlog file by splitting(logically) hlog 
> to slices(configurable size, eg hdfs trunk size 64M)
> support concurrent multiple split tasks on a single hlog file slice 
> 6) Do not cancel the timeout split task until one task reports it succeeds 
> (avoids scenario where split for a hlog file fails due to no one task can 
> succeed within the timeout period ), and and reschedule a same split task to 
> reduce split time ( to avoid some straggler in hlog split)
> 7) Consider the hlog data locality when schedule the hlog split task.  
> Schedule the hlog to a splitter which is near to hlog data.
> 8) Support multi hlog writers and switching to another hlog writer when long 
> write latency to current hlog due to possible temporary network spike? 
> This is a draft which lists the improvements about hlog we try to implement 
> in the near future. Comments and discussions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9873) Some improvements in hlog and hlog split

Reply via email to