Re: better presplitting

Keith Turner Sat, 21 Jun 2014 08:47:27 -0700

On Fri, Jun 20, 2014 at 11:52 PM, ivan.bella <[email protected]> wrote:


> Right...pre splitting more gradually might be worthwhile...
>

Yeah, If balancing is a problem adding 128 splits that are evenly
distributed and letting those spread would probably help alot.  After the
128 spread then add the rest.

I did the following in 1.4.0 and was able to add 100,000 splits in ~4mins
using 16 threads.  I think i merged this code into 1.4.0 with a default of
16 threads.  I wonder what has changed.  This is an example of another
targeted performance test we need to check for regressions.

https://github.com/keith-turner/Accumulo-Parallel-Splitter

In addition to balancing, for 1.5 and 1.6 hsync and ACCUMULO-2766 may be
contributing to some of the slowness.  Each split does 2 synchronous writes
to the metadata table, which results in an hsync.  If hsync takes 50 ms and
there are 16 threads adding splits, then 50ms * 100,000 / 16 = 624 seconds.
 However w/ group commit not working properly, these numbers may be worse
as all of the parallel writes to metadata from tservers splitting would
have to wait on each other.



>
> <div>-------- Original message --------</div><div>From: dlmarion <
> [email protected]> </div><div>Date:06/20/2014  7:26 PM  (GMT-05:00)
> </div><div>To: [email protected] </div><div>Subject: Re: better
> presplitting </div><div>
> </div>We have always had issues with splitting taking a long time. Its a
> serial process that has to compete with the balancer for a lock on the
> metadata table. At least in 1.4 anyway, my information may be outdated.
> Trying to add threads to create splits in parallel was never faster. It
> would be nice if you could manually acquire a lock on the metadata table in
> the shell, add all your split points, then release the lock and let the
> tservers figure it out. In this case you could parallelize the splitting by
> avoiding splitting the last tablet, but split at the midpoint of the last
> tablet and last split.
>
>
>
> <div>-------- Original message --------</div><div>From: Josh Elser <
> [email protected]> </div><div>Date:06/20/2014  6:33 PM  (GMT-05:00)
> </div><div>To: [email protected] </div><div>Subject: Re: better
> presplitting </div><div>
> </div>On Jun 20, 2014 12:41 PM, "Sean Busbey" <[email protected]> wrote:
> >
> > When you add splits, they definitely start out on the server that is
> > hosting the tablet that has to split apart.  They have to, since the
> tablet
> > that hosted the previous key extent is the only one that can properly
> > handle requests for the new key extents.
> >
> > We've run into this consistently when doing any testing that requires
> > pre-splitting for perf reasons.
>
> I'd have to pull up the split code, but it seems like a simple fix could be
> to let all but one result of the split of a tablet remain local. That way
> the current server doesn't get bogged down, and the master would just use
> the regular assignment path instead of waiting for the balancer to kick in.
>
> Maybe there's a reason this doesn't work though :)
>
> > In the case of YCSB tests, Mike scripted some nice manual pre-splitting
> in
> > waves:
> >
> > * split table into X parts
> > * wait for balancing
> > * split each X part into Y parts
> > * wait for balancing
> >
> > presuming the goal is to end up with X*Y presplits, this was way faster
> > than just asking for the total right off the bat.
> >
> > We could generally look at improving the migration code to handle these
> > reassignments faster, but how often does this situation come up for
> people
> > who aren't making a new table? If the "do this offline" feature speeds up
> > the new table use case enough, I'm not sure optimizing the migration path
> > is worth the time investment right now.
> >
> >
> > On Fri, Jun 20, 2014 at 3:09 PM, Josh Elser <[email protected]>
> wrote:
> >
> > > bq. They all started out on one server
> > >
> > > This seems.. weird. Would be good to start addressing this by
> identifying
> > > what the actual balancer code does so we can immediately start to test
> the
> > > assertions. We can then use the results to identify the deficiencies
> that
> > > exist.
> > >
> > > I think the 200splits per server was an Eric quote from some time ago
> > > (1.4-ish, maybe 1.5). I think this is relative to a bunch of things,
> > > workload and memory available most notably, and would be good to
> quantify
> > > too.
> > >
> > >
> > > On 6/20/14, 11:58 AM, Sean Busbey wrote:
> > >
> > >> One thing that jumped out from the most recent D4M paper was this
> quote:
> > >>
> > >>    One issue that was encountered is that after creating the
> pre-splits,
> > >> they all started out on one server. Accumulo load balanced the splits
> > >> across its servers at rate of ~50 splits/second, which is more than
> > >> adequate for normal operation, but can take ~20 minutes for 50,000
> pre-
> > >> splits.[1]
> > >>
> > >> Do we already have an open ticket that would help this? I think maybe
> > >> there's one about being able to presplit a table that is offline?
> > >>
> > >> I believe our recommended sweet spot is like 100-200 tablets per
> server
> > >> (though I can't find the reference for *why* I believe this ATM),
> which
> > >> means for clusters in the ~100s of nodes this would be in the ballpark
> for
> > >> an expected number of pre-splits.
> > >>
> > >>
> > >> [1]:  arXiv:1406.4923v1 [cs.DB]
> > >>
> > >>
> >
> >
> > --
> > Sean
>

Re: better presplitting

Reply via email to