[
https://issues.apache.org/jira/browse/HBASE-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706434#comment-13706434
]
Jesse Yates commented on HBASE-8925:
------------------------------------
I _think_ part of the problem is also that the server gets hit with a bunch of
compactions, limiting its available IO; not entirely confirmed, but it seems
somewhat correlated.
It would be easier to have multiple threads to manage a single server's queue
and could see some good speedup, unless its bandwidth to the machine that is
the limiting factor. I'll run some more tests (or rather, get [~sameerv] to run
them - he was the one who has been doing all the actual testing, I'm just the
debugging monkey) to see if we can confirm either way.
> [replication] Allow lazy RS to help overwhelmed RS
> --------------------------------------------------
>
> Key: HBASE-8925
> URL: https://issues.apache.org/jira/browse/HBASE-8925
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.98.0, 0.95.2, 0.94.10
> Reporter: Jesse Yates
>
> Sometimes in usual course of things, one of the regionservers gets waaaaay
> behind replicating its queue; easily build-ups of 40-50 files over just a day
> (running YCSB at the same time). However, this is just for a single RS -
> others don't have anything to replicate. We can manually get around this by
> moving the region load away from the overloaded server (and get smarter about
> this by writing our own load balancer). However, moving regions around just
> to catch up the replication seems a bit heavyweight.
> From this thread on the dev list:
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201211.mbox/%3CCAFLnt_qj1stL=vre5abwqawpkwkg7ldebwcyhddkbqvx4up...@mail.gmail.com%3E
> it seems like we can already get out-of-order updates for a table on the
> target cluster. Given this is already the behavior (though not common), we
> could allow a 'lazy' RS to have a secondary log to replicate when it has
> time.
> This adds a bit more complexity around who owns which log for replication,
> but could dramatically increase throughput as you aren't bottle-necked by the
> single slow host.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira