On Tue, Mar 3, 2015 at 5:37 PM, Vladimir Rodionov <[email protected]> wrote:
> Matteo, > > For large cluster/table this one: > > - the master will aggregate the result and verify the integrity > > looks like a real bottleneck. > the integrity verification is just file names verification not actual data verification. > Any other hidden serialized parts of the implementation? > nothing else I can think about > On Tue, Mar 3, 2015 at 9:25 AM, Matteo Bertozzi <[email protected]> > wrote: > > > the high-level overview of snapshot is: > > - client ask the master to take a snapshot > > - the master lookup the RS that are hosting the regions for the > specified > > table > > - the master creates a znode to notify the RSs to take a snapshot > > - each RS involved will get notified and take the snapshot. which is > flush > > + writing a manifest > > - each RS involved will respond to the master > > - the master will aggregate the result and verify the integrity > > - snapshot complete > > > > so, the time required to take a snapshot is bounded by the slowest region > > to flush/respond. > > You can try with SKIP_FLUSH = true > > also, if you grep Snapshot from the master log you can see what is taking > > long. > > > > Matteo > > > > > > On Tue, Mar 3, 2015 at 5:18 PM, Vladimir Rodionov < > [email protected]> > > wrote: > > > > > Some discussions: > > > http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/43616 > > > > > > Any ideas why? It should not take 10s of seconds (unless we flush > several > > > GBs per server) > > > I got info from my coworker that it is indeed slow (20+ sec on an > almost > > > empty table). > > > > > > I have not started testing myself yet but before I start digging into > it > > I > > > would like to collect opinions from HBase folks. > > > > > > -Vlad > > > > > >
