> On Aug. 19, 2014, 6:31 p.m., Josh Elser wrote: > > One big design concern I have is what gains the final solution would > > actually have over what is currently possible with Accumulo as it stands. > > > > Right now, you can force tablets to migrate by stopping a tserver. This > > goes back through the balancer, so you have a bit of churn in however many > > "rounds" the Balancer takes to choose where those tablets should go, and > > then for the master to process the necessary assignments for each tserver. > > How I'm seeing it described is that the only piece of the puzzle that we're > > making better is removing the migration components in favor of letting the > > user control this directly. How much does a "smart" Balancer implementation > > close the gap between the user providing migrations in regards to > > performance? Also, how does removing the Balancer from the equation change > > the wall time to get a tablet assigned (is it significant)? > > > > We have to also understand that while we can decompose the problem into > > some simple primitives, I believe this approach is still a rather difficult > > distributed state problem that I'm worried is being over-architected. My > > $0.02. > > Josh Elser wrote: > For context, I was reading about HBase's support on the subject and found > http://hbase.apache.org/book/node.management.html. Their general approach is > to provide a graceful shutdown for regionservers. This is still subject to > problems in mass amounts of servers being stopped at one time. To alleviate > some of this pain, they use ZK to store what servers are currently in a > "draining state" to avoid new assignments to those nodes -- "[...] > decommissioning mulitple nodes may be non-optimal because regions that are > being drained from one region server may be moved to other regionservers that > are also draining. Marking RegionServers to be in the draining state prevents > this from happening", > > kturner wrote: > An alternative to this design, is one that Mike mentioned on the issue. > Temporarily replace the balancer. I am thinking that providing these > primitves for manipulating tablets will allow an administrator to quickly > script a one off solution to a problem, in addition to solving the rolling > restart problem. You do not get this quick flexibility with writing a new > balancer. > > Killing tablet servers is a solution. I think it would be nice to have a > solution that avoids log recovery, minimizes down time of individual tablets, > preserves locality, and is easy to use. It does not have to be this > solution. W/o additional scripts, the primary use case in 1454 would not be > easy to use. A balancer alone would not be enough to achieve the goal of > migrating tablets between old and new tservers on the same node. However a > balancer + tservers states like you mentioned from HBAse may provide enough. > Should probably try to explore the balancer option a bit more. > > kturner wrote: > One other thing I was thinking about was that you can not make > assumptions about the environment. Users may not use the Accumulo scripts to > start and stop tservers. > > Josh Elser wrote: > I think there would be merit in enumerating what would be needed by a > custom Balancer. Is it really something that would need to be written on a > per-instance basis, or is there something we could provide that would be more > conducive to "heavy" tserver churn. > > I would definitely not advocate killing tservers. A graceful shutdown > would be much more desirable. We get a little bit of help here by the > client-side scan retries for not having to quiesce all reads to a tablet, but > that could still introduce more latency for a query (e.g. lots of filtering > over a large row). > > As mentioned about concerns with the final two-tservers-per-node > approach, I'm not entirely convinced that "sibling" tservers is worth the > complexity. We really don't have that much locality in how we use HDFS now. > Is trying to keep all of the tablets assigned on the same node going to make > things much more efficient over assigning them to nodes elsewhere? I don't > even have a good grasp for what these perf numbers would be at a high level. > > kturner wrote: > Eric looked into locality once when running continuous ingest and found > that ~50% of tablets had local data. This matches expectations as the > default balancer will try to migrate one child after a split. > > The sibling tserver concept may be too complex to implement. Sigh, but > its so cool :) > > Josh Elser wrote: > Clarification on what I meant by locality: we don't consider HDFS block > locations when we chose where Tablets get assigned, AFAIK. Yes, we'll have > locality when we're slamming Accumulo with ingest, but once we start > agitating at any reasonable rate, that's going to be lost. > > Requiring sibling tservers also implies that you have ample extra > resources on a node which is absolutely not going to be the case for most > systems. It would be nice, but it sounds to me like a one-off from what would > be the norm. :)
On the mailing list Adam thought locality reached 90% in long running CI test. Need to ask Eric what he saw. Seems plausable that as time since split increases that locality would increase in a stable system. - kturner ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24855/#review51006 ----------------------------------------------------------- On Aug. 19, 2014, 5:50 p.m., kturner wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24855/ > ----------------------------------------------------------- > > (Updated Aug. 19, 2014, 5:50 p.m.) > > > Review request for accumulo. > > > Bugs: ACCUMULO-1454 > https://issues.apache.org/jira/browse/ACCUMULO-1454 > > > Repository: accumulo > > > Description > ------- > > Positing ACCUMULO-1454 design doc for review > > > Diffs > ----- > > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION > > Diff: https://reviews.apache.org/r/24855/diff/ > > > Testing > ------- > > > Thanks, > > kturner > >