One thing that jumped out from the most recent D4M paper was this quote: One issue that was encountered is that after creating the pre-splits, they all started out on one server. Accumulo load balanced the splits across its servers at rate of ~50 splits/second, which is more than adequate for normal operation, but can take ~20 minutes for 50,000 pre- splits.[1]
Do we already have an open ticket that would help this? I think maybe there's one about being able to presplit a table that is offline? I believe our recommended sweet spot is like 100-200 tablets per server (though I can't find the reference for *why* I believe this ATM), which means for clusters in the ~100s of nodes this would be in the ballpark for an expected number of pre-splits. [1]: arXiv:1406.4923v1 [cs.DB] -- Sean
