Matteo, thanks for reminding me about verification stage - this is it. Master, verifying 60K files and regions in a live cluster ... this can explain 30 minutes.
If even one region splits during a snapshot, snapshot will fail. Its amazing, that these guys are able to to finish snapshot at all. -Vlad On Fri, Jul 10, 2015 at 6:12 PM, Matteo Bertozzi <theo.berto...@gmail.com> wrote: > yeah, something along that line. but I doubt the problem is RS side. > or the communication between the master and RSs. > > in theory the problem may be the verification step where the master > is checking the snapshot. I was just trying to figure out where he is > spending the time > and that "30 minutes to snapshot" does not sound right to me, > because the snapshot phase where each RS take a manifest should not take > that long. > > Matteo > > > On Fri, Jul 10, 2015 at 6:04 PM, Vladimir Rodionov <vladrodio...@gmail.com > > > wrote: > > > Matteo, there should be some explanation for 30 min flash_skip snapshot. > I > > think its should be somewhere in NN/Hdfs. This is a huge cluster and NN > > load is extreme, it is probably does not scale well with # DNs and #files > > per directory. I presume that NN performance on file operations degrades > > when # of DNs and/or directory sizes increase. > > > > -Vlad > > > > On Fri, Jul 10, 2015 at 5:29 PM, Matteo Bertozzi < > theo.berto...@gmail.com> > > wrote: > > > > > Manifest per Region, not family. > > > we couldn't send them back to the master/table to keep compatibility. > > > 60k region on 1200 RS are ~50 manifest per RS that alone should not > take > > > 30sec > > > > > > > > > On Fri, Jul 10, 2015 at 5:21 PM, Vladimir Rodionov < > > vladrodio...@gmail.com > > > > > > > wrote: > > > > > > > OK, even with 1 manifest file per region (per column family?) - 60K X > > 4 = > > > > 240,000 new files > > > > 8000 per minute, 135 per second. That is probably NN limit. > > > > > > > > Anyway, the root cause is the same as with reference files during > > region > > > > split: > > > > > > > > HDFS does not do well on file create/open/close/delete. > > > > > > > > -Vlad > > > > > > > > On Fri, Jul 10, 2015 at 5:09 PM, Matteo Bertozzi < > > > theo.berto...@gmail.com> > > > > wrote: > > > > > > > > > @Vladimir there is no hfile link creation on snapshot. we create 1 > > > > manifest > > > > > per region > > > > > > > > > > Matteo > > > > > > > > > > > > > > > On Fri, Jul 10, 2015 at 5:06 PM, Vladimir Rodionov < > > > > vladrodio...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > Being not very familiar with snapshot code, I could speculate > only > > on > > > > > where > > > > > > most time is spent ... > > > > > > > > > > > > In creating 60K x 4 x K (K is average # of store files per > region) > > > > small > > > > > > HFileLInks? This can be very large # of files. > > > > > > > > > > > > -Vlad > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 10, 2015 at 4:57 PM, Matteo Bertozzi < > > > > > theo.berto...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > the total time taken by a snapshot should be bounded by the > > slowest > > > > > > > machine. > > > > > > > we send a notification to each RS and each RS execute the > > snapshot > > > > > > > operation for each region. > > > > > > > can you track down what is slow in your case? > > > > > > > > > > > > > > clone has to create a reference for each file, and that is a > > master > > > > > > > operation, and these calls may all go away if we change the > > layout > > > > in a > > > > > > > proper way instead of doing what is proposed in HBASE-13991. > > > > > > > Most of the time should be spent on the enableTable phase of > the > > > > clone. > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 10, 2015 at 4:36 PM, Jean-Marc Spaggiari < > > > > > > > jean-m...@spaggiari.org> wrote: > > > > > > > > > > > > > > > Hi Rahul, > > > > > > > > > > > > > > > > Have you identified with it takes those 30 minutes? Is the > > table > > > > > > balances > > > > > > > > correctly across the servers? Form the logs, are you able to > > > > identify > > > > > > > what > > > > > > > > takes that much time? > > > > > > > > > > > > > > > > JM > > > > > > > > > > > > > > > > 2015-07-10 18:46 GMT-04:00 rahul gidwani < > > > rahul.gidw...@gmail.com > > > > >: > > > > > > > > > > > > > > > > > Hi Matteo, > > > > > > > > > > > > > > > > > > We do SKIP_FLUSH. We have 1200+ regionservers with a > single > > > > table > > > > > > with > > > > > > > > 60k > > > > > > > > > regions and 4 column families. It takes around 30 minutes > to > > > > > > snapshot > > > > > > > > this > > > > > > > > > table using manifests compared to just seconds doing this > > with > > > > > hdfs. > > > > > > > > > Cloning this table takes considerably longer. > > > > > > > > > > > > > > > > > > For cases where someone would want to run Map/Reduce over > > > > snapshots > > > > > > > this > > > > > > > > > could be much faster as we could take an hdfs snapshot and > > > bypass > > > > > the > > > > > > > > > clone. > > > > > > > > > > > > > > > > > > rahul > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi < > > > > > > > > theo.berto...@gmail.com> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani < > > > > > > > > rahul.gidw...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Even with manifests (Snapshot V2) for our larger tables > > it > > > > can > > > > > > take > > > > > > > > > hours > > > > > > > > > > > to Snapshot and Clone a table. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > on snapshot time the only thing that can take hours, is > > > > "flush". > > > > > > > > > > if you don't need that (which is what you get with hdfs > > > > > snapshots) > > > > > > > you > > > > > > > > > can > > > > > > > > > > specify SKIP_FLUSH => true > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Matteo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani < > > > > > > > > rahul.gidw...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > HBase snapshots are a very useful feature. but it was > > > > > implemented > > > > > > > > back > > > > > > > > > > > before there was the ability to snapshot via HDFS. > > > > > > > > > > > > > > > > > > > > > > Newer versions of Hadoop support HDFS snapshots. I was > > > > > wondering > > > > > > > if > > > > > > > > > the > > > > > > > > > > > community would be interested in something like a > > Snapshot > > > V3 > > > > > > where > > > > > > > > we > > > > > > > > > > use > > > > > > > > > > > HDFS to take these snapshots. > > > > > > > > > > > > > > > > > > > > > > Even with manifests (Snapshot V2) for our larger tables > > it > > > > can > > > > > > take > > > > > > > > > hours > > > > > > > > > > > to Snapshot and Clone a table. > > > > > > > > > > > > > > > > > > > > > > Would this feature be of use to anyone? > > > > > > > > > > > > > > > > > > > > > > thanks > > > > > > > > > > > rahul > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >