[jira] [Commented] (HBASE-8925) [replication] Allow lazy RS to help overwhelmed RS

Jesse Yates (JIRA) Thu, 11 Jul 2013 16:28:04 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706434#comment-13706434
 ]


Jesse Yates commented on HBASE-8925:
------------------------------------

I _think_ part of the problem is also that the server gets hit with a bunch of 
compactions, limiting its available IO; not entirely confirmed, but it seems 
somewhat correlated. 

It would be easier to have multiple threads to manage a single server's queue 
and could see some good speedup, unless its bandwidth to the machine that is 
the limiting factor. I'll run some more tests (or rather, get [~sameerv] to run 
them - he was the one who has been doing all the actual testing, I'm just the 
debugging monkey) to see if we can confirm either way.
                
> [replication] Allow lazy RS to help overwhelmed RS
> --------------------------------------------------
>
>                 Key: HBASE-8925
>                 URL: https://issues.apache.org/jira/browse/HBASE-8925
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.98.0, 0.95.2, 0.94.10
>            Reporter: Jesse Yates
>
> Sometimes in usual course of things, one of the regionservers gets waaaaay 
> behind replicating its queue; easily build-ups of 40-50 files over just a day 
> (running YCSB at the same time). However, this is just for a single RS - 
> others don't have anything to replicate. We can manually get around this by 
> moving the region load away from the overloaded server (and get smarter about 
> this by writing our own load balancer). However, moving regions around just 
> to catch up the replication seems a bit heavyweight.
> From this thread on the dev list: 
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201211.mbox/%3CCAFLnt_qj1stL=vre5abwqawpkwkg7ldebwcyhddkbqvx4up...@mail.gmail.com%3E
> it seems like we can already get out-of-order updates for a table on the 
> target cluster. Given this is already the behavior (though not common), we 
> could allow a 'lazy' RS to have a secondary log to replicate when it has 
> time. 
> This adds a bit more complexity around who owns which log for replication, 
> but could dramatically increase throughput as you aren't bottle-necked by the 
> single slow host.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8925) [replication] Allow lazy RS to help overwhelmed RS

Reply via email to