On Fri, Jun 20, 2014 at 11:52 PM, ivan.bella <[email protected]> wrote:
> Right...pre splitting more gradually might be worthwhile... > Yeah, If balancing is a problem adding 128 splits that are evenly distributed and letting those spread would probably help alot. After the 128 spread then add the rest. I did the following in 1.4.0 and was able to add 100,000 splits in ~4mins using 16 threads. I think i merged this code into 1.4.0 with a default of 16 threads. I wonder what has changed. This is an example of another targeted performance test we need to check for regressions. https://github.com/keith-turner/Accumulo-Parallel-Splitter In addition to balancing, for 1.5 and 1.6 hsync and ACCUMULO-2766 may be contributing to some of the slowness. Each split does 2 synchronous writes to the metadata table, which results in an hsync. If hsync takes 50 ms and there are 16 threads adding splits, then 50ms * 100,000 / 16 = 624 seconds. However w/ group commit not working properly, these numbers may be worse as all of the parallel writes to metadata from tservers splitting would have to wait on each other. > > <div>-------- Original message --------</div><div>From: dlmarion < > [email protected]> </div><div>Date:06/20/2014 7:26 PM (GMT-05:00) > </div><div>To: [email protected] </div><div>Subject: Re: better > presplitting </div><div> > </div>We have always had issues with splitting taking a long time. Its a > serial process that has to compete with the balancer for a lock on the > metadata table. At least in 1.4 anyway, my information may be outdated. > Trying to add threads to create splits in parallel was never faster. It > would be nice if you could manually acquire a lock on the metadata table in > the shell, add all your split points, then release the lock and let the > tservers figure it out. In this case you could parallelize the splitting by > avoiding splitting the last tablet, but split at the midpoint of the last > tablet and last split. > > > > <div>-------- Original message --------</div><div>From: Josh Elser < > [email protected]> </div><div>Date:06/20/2014 6:33 PM (GMT-05:00) > </div><div>To: [email protected] </div><div>Subject: Re: better > presplitting </div><div> > </div>On Jun 20, 2014 12:41 PM, "Sean Busbey" <[email protected]> wrote: > > > > When you add splits, they definitely start out on the server that is > > hosting the tablet that has to split apart. They have to, since the > tablet > > that hosted the previous key extent is the only one that can properly > > handle requests for the new key extents. > > > > We've run into this consistently when doing any testing that requires > > pre-splitting for perf reasons. > > I'd have to pull up the split code, but it seems like a simple fix could be > to let all but one result of the split of a tablet remain local. That way > the current server doesn't get bogged down, and the master would just use > the regular assignment path instead of waiting for the balancer to kick in. > > Maybe there's a reason this doesn't work though :) > > > In the case of YCSB tests, Mike scripted some nice manual pre-splitting > in > > waves: > > > > * split table into X parts > > * wait for balancing > > * split each X part into Y parts > > * wait for balancing > > > > presuming the goal is to end up with X*Y presplits, this was way faster > > than just asking for the total right off the bat. > > > > We could generally look at improving the migration code to handle these > > reassignments faster, but how often does this situation come up for > people > > who aren't making a new table? If the "do this offline" feature speeds up > > the new table use case enough, I'm not sure optimizing the migration path > > is worth the time investment right now. > > > > > > On Fri, Jun 20, 2014 at 3:09 PM, Josh Elser <[email protected]> > wrote: > > > > > bq. They all started out on one server > > > > > > This seems.. weird. Would be good to start addressing this by > identifying > > > what the actual balancer code does so we can immediately start to test > the > > > assertions. We can then use the results to identify the deficiencies > that > > > exist. > > > > > > I think the 200splits per server was an Eric quote from some time ago > > > (1.4-ish, maybe 1.5). I think this is relative to a bunch of things, > > > workload and memory available most notably, and would be good to > quantify > > > too. > > > > > > > > > On 6/20/14, 11:58 AM, Sean Busbey wrote: > > > > > >> One thing that jumped out from the most recent D4M paper was this > quote: > > >> > > >> One issue that was encountered is that after creating the > pre-splits, > > >> they all started out on one server. Accumulo load balanced the splits > > >> across its servers at rate of ~50 splits/second, which is more than > > >> adequate for normal operation, but can take ~20 minutes for 50,000 > pre- > > >> splits.[1] > > >> > > >> Do we already have an open ticket that would help this? I think maybe > > >> there's one about being able to presplit a table that is offline? > > >> > > >> I believe our recommended sweet spot is like 100-200 tablets per > server > > >> (though I can't find the reference for *why* I believe this ATM), > which > > >> means for clusters in the ~100s of nodes this would be in the ballpark > for > > >> an expected number of pre-splits. > > >> > > >> > > >> [1]: arXiv:1406.4923v1 [cs.DB] > > >> > > >> > > > > > > -- > > Sean >
