[
https://issues.apache.org/jira/browse/HBASE-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811449#comment-13811449
]
stack commented on HBASE-9873:
------------------------------
Thanks for looking into important issue [~liushaohui]
bq. 1) Try to clean old hlog after each memstore flush to avoid unnecessary
hlogs split in failover. Now hlogs cleaning only be run in rolling hlog writer.
Are we just scheduling more checks? Is that the idea? Doing it at flush time
is a good idea as juncture for WAL-clean-up. Do you observe us lagging the
cleanup by just doing it on log roll?
bq. 2) Add a background hlog compaction thread to compaction the hlog: remove
the hlog entries whose data have been flushed to hfile. The scenario is that in
a share cluster, write requests of a table may very little and periodical, a
lots of hlogs can not be cleaned for entries of this table in those hlogs.
Do you think this will help? You will have to do a bunch of reading and
rewriting, right? You will only rewrite WALs that have at least some
percentage of flushed edits? Would it be better to work on making it so we are
better at flushing the memstores that have edits holding up our letting go of
old WALs? Just asking.
bq. 4) Support running multiple hlog splitters on a single RS and on
master(latter can boost split efficiency for tiny cluster)
I agree we need more slots on smaller clusters; most of the time log splitting
is just waiting on a slot to open. [~jeffreyz] has opinion on this; his
thought is that there would need to be lots of spare i/o on a cluster for more
slots to make a difference (I like the idea of master hosting a splitlogworker
-- on a small cluster 5 nodes or so, it'd make a significant difference).
I have also wondered if we can't speed read/write of splits. I was studying
log splitting a while back and it seemed to run slow on a small cluster.
Chatting w/ [~jeffreyz], he suggested that concentrating on speeding up writing
of the splits should be where we should concentrate since this is writing out
three replicas whereas we are reading from one only. Is there anything we
could do to speed writing/reading of WALs around split time (I was seeing 30-60
seconds to do a 128M WAL).
bq. 5) Enable multiple splitters on 'big' hlog file by splitting(logically)
hlog to slices(configurable size, eg hdfs trunk size 64M)
support concurrent multiple split tasks on a single hlog file slice
You think this one necessary? If enough slots? We'll start to have issues
where edits come out of order -- something that we need to address for multiwal
case but likely not something that will be fixed in 0.94 (I see FB do this
right?)
bq. 7) Consider the hlog data locality when schedule the hlog split task.
Schedule the hlog to a splitter which is near to hlog data.
This would be great.
bq. 8) Support multi hlog writers and switching to another hlog writer when
long write latency to current hlog due to possible temporary network spike?
This effort is starting up ([[email protected]] -- you have an issue for
the multiwal work?). Let me add it as a link to this one.
Does lease recovery work reliably for you fellas? You have HDFS-3703, etc.,
patched into your hadoop and the 'stale' dn detection enabled?
> Some improvements in hlog and hlog split
> ----------------------------------------
>
> Key: HBASE-9873
> URL: https://issues.apache.org/jira/browse/HBASE-9873
> Project: HBase
> Issue Type: Improvement
> Reporter: Liu Shaohui
> Labels: failover, hlog
>
> Some improvements in hlog and hlog split
> 1) Try to clean old hlog after each memstore flush to avoid unnecessary hlogs
> split in failover. Now hlogs cleaning only be run in rolling hlog writer.
> 2) Add a background hlog compaction thread to compaction the hlog: remove the
> hlog entries whose data have been flushed to hfile. The scenario is that in a
> share cluster, write requests of a table may very little and periodical, a
> lots of hlogs can not be cleaned for entries of this table in those hlogs.
> 3) Rely on the smallest of all biggest hfile's seqId of previous served
> regions to ignore some entries. Facebook have implemented this in HBASE-6508
> and we backport it to hbase 0.94 in HBASE-9568.
> 4) Support running multiple hlog splitters on a single RS and on
> master(latter can boost split efficiency for tiny cluster)
> 5) Enable multiple splitters on 'big' hlog file by splitting(logically) hlog
> to slices(configurable size, eg hdfs trunk size 64M)
> support concurrent multiple split tasks on a single hlog file slice
> 6) Do not cancel the timeout split task until one task reports it succeeds
> (avoids scenario where split for a hlog file fails due to no one task can
> succeed within the timeout period ), and and reschedule a same split task to
> reduce split time ( to avoid some straggler in hlog split)
> 7) Consider the hlog data locality when schedule the hlog split task.
> Schedule the hlog to a splitter which is near to hlog data.
> 8) Support multi hlog writers and switching to another hlog writer when long
> write latency to current hlog due to possible temporary network spike?
> This is a draft which lists the improvements about hlog we try to implement
> in the near future. Comments and discussions are welcomed.
--
This message was sent by Atlassian JIRA
(v6.1#6144)