Re: Review Request 24855: ACCUMULO-1454 design doc

keith Tue, 19 Aug 2014 13:49:10 -0700


> On Aug. 19, 2014, 6:31 p.m., Josh Elser wrote:
> > One big design concern I have is what gains the final solution would 
> > actually have over what is currently possible with Accumulo as it stands.
> > 
> > Right now, you can force tablets to migrate by stopping a tserver. This 
> > goes back through the balancer, so you have a bit of churn in however many 
> > "rounds" the Balancer takes to choose where those tablets should go, and 
> > then for the master to process the necessary assignments for each tserver. 
> > How I'm seeing it described is that the only piece of the puzzle that we're 
> > making better is removing the migration components in favor of letting the 
> > user control this directly. How much does a "smart" Balancer implementation 
> > close the gap between the user providing migrations in regards to 
> > performance? Also, how does removing the Balancer from the equation change 
> > the wall time to get a tablet assigned (is it significant)?
> > 
> > We have to also understand that while we can decompose the problem into 
> > some simple primitives, I believe this approach is still a rather difficult 
> > distributed state problem that I'm worried is being over-architected. My 
> > $0.02.
> 
> Josh Elser wrote:
>     For context, I was reading about HBase's support on the subject and found 
> http://hbase.apache.org/book/node.management.html. Their general approach is 
> to provide a graceful shutdown for regionservers. This is still subject to 
> problems in mass amounts of servers being stopped at one time. To alleviate 
> some of this pain, they use ZK to store what servers are currently in a 
> "draining state" to avoid new assignments to those nodes -- "[...] 
> decommissioning mulitple nodes may be non-optimal because regions that are 
> being drained from one region server may be moved to other regionservers that 
> are also draining. Marking RegionServers to be in the draining state prevents 
> this from happening",
> 
> kturner wrote:
>     An alternative to this design, is one that Mike mentioned on the issue.   
> Temporarily replace the balancer.  I am thinking that providing these 
> primitves for manipulating tablets will allow an administrator to quickly 
> script a one off solution to a problem, in addition to solving the rolling 
> restart problem.  You do not get this quick flexibility with writing a new 
> balancer.
>     
>     Killing tablet servers is a solution.  I think it would be nice to have a 
> solution that avoids log recovery, minimizes down time of individual tablets, 
> preserves locality, and is easy to use.  It does not have to be this 
> solution.  W/o additional scripts, the primary use case in 1454 would not be 
> easy to use.   A balancer alone would not be enough to achieve the goal of 
> migrating tablets between old and new tservers on the same node.  However a 
> balancer + tservers states like you mentioned from HBAse may provide enough.  
> Should probably try to explore the balancer option a bit more.
> 
> kturner wrote:
>     One other thing I was thinking about was that you can not make 
> assumptions about the environment.  Users may not use the Accumulo scripts to 
> start and stop tservers.
> 
> Josh Elser wrote:
>     I think there would be merit in enumerating what would be needed by a 
> custom Balancer. Is it really something that would need to be written on a 
> per-instance basis, or is there something we could provide that would be more 
> conducive to "heavy" tserver churn.
>     
>     I would definitely not advocate killing tservers. A graceful shutdown 
> would be much more desirable. We get a little bit of help here by the 
> client-side scan retries for not having to quiesce all reads to a tablet, but 
> that could still introduce more latency for a query (e.g. lots of filtering 
> over a large row).
>     
>     As mentioned about concerns with the final two-tservers-per-node 
> approach, I'm not entirely convinced that "sibling" tservers is worth the 
> complexity. We really don't have that much locality in how we use HDFS now. 
> Is trying to keep all of the tablets assigned on the same node going to make 
> things much more efficient over assigning them to nodes elsewhere? I don't 
> even have a good grasp for what these perf numbers would be at a high level.
> 
> kturner wrote:
>     Eric looked into locality once when running continuous ingest and found 
> that ~50% of tablets had local data.    This matches expectations as the 
> default balancer will try to migrate one child after a split.
>     
>     The sibling tserver concept may be too complex to implement.  Sigh, but 
> its so cool :)
> 
> Josh Elser wrote:
>     Clarification on what I meant by locality: we don't consider HDFS block 
> locations when we chose where Tablets get assigned, AFAIK. Yes, we'll have 
> locality when we're slamming Accumulo with ingest, but once we start 
> agitating at any reasonable rate, that's going to be lost.
>     
>     Requiring sibling tservers also implies that you have ample extra 
> resources on a node which is absolutely not going to be the case for most 
> systems. It would be nice, but it sounds to me like a one-off from what would 
> be the norm. :)


On the mailing list Adam thought locality reached 90% in long running CI test.  
Need to ask Eric what he saw.  Seems plausable that as time since split 
increases that locality would increase in a stable system.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51006
-----------------------------------------------------------


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>

Re: Review Request 24855: ACCUMULO-1454 design doc

Reply via email to