[ 
https://issues.apache.org/jira/browse/ACCUMULO-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277030#comment-14277030
 ] 

Josh Elser commented on ACCUMULO-3471:
--------------------------------------

Thinking about this some more...

To get batches of assignments, it might be more straightforward to have the 
master perform the binning of tablets that don't require log recovery together. 
The tabletserver is still going to have sequential processing of incoming 
assignments to avoid problems with resource usage by recovery. If the 
tabletserver receives a collection of extents to load (instead of just a single 
extent), it would be easy to bring all of the tablets online cleanly. 

Another option would be to rework the recovery code so that recoveries could be 
handled in parallel (in the eyes of the caller -- they might actually be 
executed serially behind the scene). The tablet server could build up a 
collection of extents to load and process them itself. This sounds more 
difficult to me than the first suggestion.

> Adding a new tserver puts some tables offline for few minutes
> -------------------------------------------------------------
>
>                 Key: ACCUMULO-3471
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3471
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.1
>         Environment: Ubuntu 12.04
>            Reporter: Denis Petrov
>             Fix For: 1.6.2, 1.7.0
>
>         Attachments: ACCUMULO-3471-balance-test.patch
>
>
> I run an Accumulo cluster with 15 tservers with about 6000 tablets on each 
> (disks are quite slow - each node has 2*4Tb SATA)
> When a new tserver added to the cluster, the rebalancing procedure starts.
> During this procedure some tablets are offline and unreachable during 5-10 
> minutes.
> It is visible in http://monitor:50095/tables and by timeouts on client side.
> The rebalancing caused by killing a tserver converges much faster then 
> rebalancing caused by adding a tserver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to