Matteo, thanks for reminding me about verification stage - this is it.
Master, verifying 60K files and regions in a live cluster ... this can
explain 30 minutes.

If even one region splits during a snapshot, snapshot will fail. Its
amazing, that these guys are able to to finish snapshot at all.

-Vlad


On Fri, Jul 10, 2015 at 6:12 PM, Matteo Bertozzi <theo.berto...@gmail.com>
wrote:

> yeah, something along that line. but I doubt the problem is RS side.
> or the communication between the master and RSs.
>
> in theory the problem may be the verification step where the master
> is checking the snapshot. I was just trying to figure out where he is
> spending the time
> and that "30 minutes to snapshot" does not sound right to me,
> because the snapshot phase where each RS take a manifest should not take
> that long.
>
> Matteo
>
>
> On Fri, Jul 10, 2015 at 6:04 PM, Vladimir Rodionov <vladrodio...@gmail.com
> >
> wrote:
>
> > Matteo, there should be some explanation for 30 min flash_skip snapshot.
> I
> > think its should be somewhere in NN/Hdfs. This is a huge cluster and NN
> > load is extreme, it is probably does not scale well with # DNs and #files
> > per directory. I presume that NN performance on file operations degrades
> > when # of DNs and/or directory sizes increase.
> >
> > -Vlad
> >
> > On Fri, Jul 10, 2015 at 5:29 PM, Matteo Bertozzi <
> theo.berto...@gmail.com>
> > wrote:
> >
> > > Manifest per Region, not family.
> > > we couldn't send them back to the master/table to keep compatibility.
> > > 60k region on 1200 RS are ~50 manifest per RS that alone should not
> take
> > > 30sec
> > >
> > >
> > > On Fri, Jul 10, 2015 at 5:21 PM, Vladimir Rodionov <
> > vladrodio...@gmail.com
> > > >
> > > wrote:
> > >
> > > > OK, even with 1 manifest file per region (per column family?) - 60K X
> > 4 =
> > > > 240,000 new files
> > > > 8000 per minute, 135 per second. That is probably NN limit.
> > > >
> > > > Anyway, the root cause is the same as with reference files during
> > region
> > > > split:
> > > >
> > > > HDFS does not do well on file create/open/close/delete.
> > > >
> > > > -Vlad
> > > >
> > > > On Fri, Jul 10, 2015 at 5:09 PM, Matteo Bertozzi <
> > > theo.berto...@gmail.com>
> > > > wrote:
> > > >
> > > > > @Vladimir there is no hfile link creation on snapshot. we create 1
> > > > manifest
> > > > > per region
> > > > >
> > > > > Matteo
> > > > >
> > > > >
> > > > > On Fri, Jul 10, 2015 at 5:06 PM, Vladimir Rodionov <
> > > > vladrodio...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Being not very familiar with snapshot code, I could speculate
> only
> > on
> > > > > where
> > > > > > most time is spent ...
> > > > > >
> > > > > > In creating 60K x 4 x K (K is average # of store files per
> region)
> > > > small
> > > > > > HFileLInks? This can be very large # of files.
> > > > > >
> > > > > > -Vlad
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 10, 2015 at 4:57 PM, Matteo Bertozzi <
> > > > > theo.berto...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > the total time taken by a snapshot should be bounded by the
> > slowest
> > > > > > > machine.
> > > > > > > we send a notification to each RS and each RS execute the
> > snapshot
> > > > > > > operation for each region.
> > > > > > > can you track down what is slow in your case?
> > > > > > >
> > > > > > > clone has to create a reference for each file, and that is a
> > master
> > > > > > > operation, and these calls may all go away if we change the
> > layout
> > > > in a
> > > > > > > proper way instead of doing what is proposed in HBASE-13991.
> > > > > > > Most of the time should be spent on the enableTable phase of
> the
> > > > clone.
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jul 10, 2015 at 4:36 PM, Jean-Marc Spaggiari <
> > > > > > > jean-m...@spaggiari.org> wrote:
> > > > > > >
> > > > > > > > Hi Rahul,
> > > > > > > >
> > > > > > > > Have you identified with it takes those 30 minutes? Is the
> > table
> > > > > > balances
> > > > > > > > correctly across the servers? Form the logs, are you able to
> > > > identify
> > > > > > > what
> > > > > > > > takes that much time?
> > > > > > > >
> > > > > > > > JM
> > > > > > > >
> > > > > > > > 2015-07-10 18:46 GMT-04:00 rahul gidwani <
> > > rahul.gidw...@gmail.com
> > > > >:
> > > > > > > >
> > > > > > > > > Hi Matteo,
> > > > > > > > >
> > > > > > > > > We do SKIP_FLUSH.  We have 1200+ regionservers with a
> single
> > > > table
> > > > > > with
> > > > > > > > 60k
> > > > > > > > > regions and 4 column families.  It takes around 30 minutes
> to
> > > > > > snapshot
> > > > > > > > this
> > > > > > > > > table using manifests compared to just seconds doing this
> > with
> > > > > hdfs.
> > > > > > > > > Cloning this table takes considerably longer.
> > > > > > > > >
> > > > > > > > > For cases where someone would want to run Map/Reduce over
> > > > snapshots
> > > > > > > this
> > > > > > > > > could be much faster as we could take an hdfs snapshot and
> > > bypass
> > > > > the
> > > > > > > > > clone.
> > > > > > > > >
> > > > > > > > > rahul
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi <
> > > > > > > > theo.berto...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > > > > > rahul.gidw...@gmail.com>
> > > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > > Even with manifests (Snapshot V2) for our larger tables
> > it
> > > > can
> > > > > > take
> > > > > > > > > hours
> > > > > > > > > > > to Snapshot and Clone a table.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > on snapshot time the only thing that can take hours, is
> > > > "flush".
> > > > > > > > > > if you don't need that (which is what you get with hdfs
> > > > > snapshots)
> > > > > > > you
> > > > > > > > > can
> > > > > > > > > > specify SKIP_FLUSH => true
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Matteo
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > > > > > rahul.gidw...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > HBase snapshots are a very useful feature. but it was
> > > > > implemented
> > > > > > > > back
> > > > > > > > > > > before there was the ability to snapshot via HDFS.
> > > > > > > > > > >
> > > > > > > > > > > Newer versions of Hadoop support HDFS snapshots.  I was
> > > > > wondering
> > > > > > > if
> > > > > > > > > the
> > > > > > > > > > > community would be interested in something like a
> > Snapshot
> > > V3
> > > > > > where
> > > > > > > > we
> > > > > > > > > > use
> > > > > > > > > > > HDFS to take these snapshots.
> > > > > > > > > > >
> > > > > > > > > > > Even with manifests (Snapshot V2) for our larger tables
> > it
> > > > can
> > > > > > take
> > > > > > > > > hours
> > > > > > > > > > > to Snapshot and Clone a table.
> > > > > > > > > > >
> > > > > > > > > > > Would this feature be of use to anyone?
> > > > > > > > > > >
> > > > > > > > > > > thanks
> > > > > > > > > > > rahul
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to