On Wed, Jun 25, 2014 at 4:50 PM, Keith Turner <[email protected]> wrote:
> I wrote a little utility to time splitting and subsequent balancing. I > will post some numbers from running this on EC2 > > https://gist.github.com/keith-turner/5c561e438cb04c501b6e > posted some performance numbers on https://issues.apache.org/jira/browse/ACCUMULO-2368 > > > On Fri, Jun 20, 2014 at 2:58 PM, Sean Busbey <[email protected]> wrote: > >> One thing that jumped out from the most recent D4M paper was this quote: >> >> One issue that was encountered is that after creating the pre-splits, >> they all started out on one server. Accumulo load balanced the splits >> across its servers at rate of ~50 splits/second, which is more than >> adequate for normal operation, but can take ~20 minutes for 50,000 pre- >> splits.[1] >> >> Do we already have an open ticket that would help this? I think maybe >> there's one about being able to presplit a table that is offline? >> >> I believe our recommended sweet spot is like 100-200 tablets per server >> (though I can't find the reference for *why* I believe this ATM), which >> means for clusters in the ~100s of nodes this would be in the ballpark for >> an expected number of pre-splits. >> >> >> [1]: arXiv:1406.4923v1 [cs.DB] >> >> -- >> Sean >> > >
