[jira] [Created] (HBASE-9873) Some improvements in hlog and hlog split

Liu Shaohui (JIRA) Fri, 01 Nov 2013 00:12:55 -0700

Liu Shaohui created HBASE-9873:
----------------------------------

             Summary: Some improvements in hlog and hlog split
                 Key: HBASE-9873
                 URL: https://issues.apache.org/jira/browse/HBASE-9873
             Project: HBase
          Issue Type: Improvement
            Reporter: Liu Shaohui



Some improvements in hlog and hlog split

1) Try to clean old hlog after each memstore flush to avoid unnecessary hlogs 
split in failover.  Now hlogs cleaning only be run in rolling hlog writer. 

2) Add a background hlog compaction thread to compaction the hlog: remove the 
hlog entries whose data have been flushed to hfile. The scenario is that in a 
share cluster, write requests of a table may very little and periodical,  a 
lots of hlogs can not be cleaned for entries of this table in those hlogs.

3) Rely on the smallest of all biggest hfile's seqId of previous served regions 
to ignore some entries.  Facebook have implemented this in HBASE-6508 and we 
backport it to hbase 0.94 in HBASE-9568.

4) Support running multiple hlog splitters on a single RS and on master(latter 
can boost split efficiency for tiny cluster)

5) Enable multiple splitters on 'big' hlog file by splitting(logically) hlog to 
slices(configurable size, eg hdfs trunk size 64M)
support concurrent multiple split tasks on a single hlog file slice 

6) Do not cancel the timeout split task until one task reports it succeeds 
(avoids scenario where split for a hlog file fails due to no one task can 
succeed within the timeout period ), and and reschedule a same split task to 
reduce split time ( to avoid some straggler in hlog split)

7) Consider the hlog data locality when schedule the hlog split task.  Schedule 
the hlog to a splitter which is near to hlog data.

8) Support multi hlog writers and switching to another hlog writer when long 
write latency to current hlog due to possible temporary network spike? 

This is a draft which lists the improvements about hlog we try to implement in 
the near future. Comments and discussions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (HBASE-9873) Some improvements in hlog and hlog split

Reply via email to