On Mon, Sep 11, 2017 at 11:07 AM, Vladimir Rodionov <[email protected]> wrote:
> Stack, Andrew > > We have doc blocker and (partially) HBASE-15227: two sub-tasks remain: one > is unit test (you can't call it blocker) > and another for FT support during incremental backup with bulk loading. The > latter one have been probably addressed > already in other HBASE-15527 subtasks. I have to reassess this. > > That is mostly it. Yes, We have not done real testing with real data on a > real cluster yet, except QA testing on a small OpenStack > cluster (10 nodes). That is our probably the biggest minus right now. I > would like to inform community that this week we are going to start > full scale testing with reasonably sized data sets. > > The recent committed improvements, such as ability to run backup/restore on > a particular Yarn pool (queue) allows precise control > of a cluster utilization during operation (not to interfere much with a > regular cluster operations). Another one - > converting WAL on the fly to HFiles - significantly improves storage usage > on a backup site. > > My plan is to finish HBASE-17825 (further performance optimizations). This > will cut down number of MR jobs during incremental backup > from 2*N to 2 (N - number of tables). That will probably take 2-3 more > days > > Then: > > 1. Address remaining two sub-tasks in HBASE-15227 > 2. Update Release notes for all relevant B&R JIRAs > 3. Work on doc > > After that we can call it feature full complete. Taking into account the > vast amount of efforts > spent on this feature (including QA testing) I would say that we are > probably quite close to GA right now, but only > after real testing is done (I do not anticipate significant issues, except > probably correct failure handling). > > On a feature itself. We provide tools to fully automate backup and restore > tasks: create backup (full and incremental), restore > from image, delete backups, merge backups, history, history per table, > backup set management. > > Hopefully, my write up addresses at least some of your concerns. > > Thanks for updating us (community) w/ status. Completion of HA seems important as is result of the scale testing. St.Ack > -Vlad > > On Sun, Sep 10, 2017 at 6:27 AM, Josh Elser <[email protected]> wrote: > > > On Sat, Sep 9, 2017 at 7:04 PM, stack <[email protected]> wrote: > > > In spite of repeated requests for eng summary of state of this feature > -- > > > summary of what is in 2.0, what is not, what the capabilities are, how > > well > > > it has been tested and at what scale -- all I get, when the requests > are > > > not ignored, are pointers to lists of ill-describing jiras and some > > pending > > > user facing doc update. > > > > Yes, this is a problem. We, especially you as RM, shouldn't have > > outstanding questions as to the quality/state of B&R. > > > > > For other features, mob or region server groups, I know that they have > > been > > > running at scale in production for as much as a year and more. I have > > some > > > confidence these items basically work. For backup/restore I have no > such > > > sense even after spending time in review and trying to use the feature. > > > > I can attest to the feature being tested on small clusters. I'm not > > sure about larger than 10node tests. If this is less a worry and more > > a veto, let's get some criteria on the kind of testing you're looking > > for to avoid having to rehash later. > > > > Do we have any kind of integration tests in the codebase now that can > > help increase Stack's confidence? > > > > > As release manager, I have say over what makes it into a release. > Unless > > > the work is done to convince me that backup/restore is more than a lump > > of > > > code and a few unit tests that can pass on some fellows laptop, I am > > going > > > to kick it out of branch-2. Let the feature harden more in master > branch > > > before it ships in a release. > > > > While it was a few months ago now, I can also attest to this being > > more than some unit tests (I think I looked at it after I saw you last > > down in the weeds). > > > > I do worry about trying to remove it at this state. > > > > * Do you consider the B&R code in the repository implicitly harmful? > > Is there harm in shipping with docs capturing the concern. > > * Trying to revert all relevant pieces from branch-2 is non-trivial. > > * I would feel quite dejected if some feature I spent a year+ working > > on (*not* making assertions on my perception of quality) was removed > > from the release line it was expected to land. > > > > > S > > > > > > On Sep 8, 2017 10:59 PM, "Vladimir Rodionov" <[email protected]> > > wrote: > > > > > >> >> Have I grasped the state of things correctly, Vlad? > > >> > > >> Josh, the only thing which is still pending is doc update. All other > > >> features are good to have but not a blockers for 2.0 release. > > >> > > >> -Vlad > > >> > > >> On Fri, Sep 8, 2017 at 10:42 PM, Vladimir Rodionov < > > [email protected] > > >> > > > >> wrote: > > >> > > >> > >> What testing and at what > > >> > >> scale has testing been done? > > >> > > > >> > Do we have have that for other features? > > >> > > > >> > > > >> > On Fri, Sep 8, 2017 at 10:41 PM, Vladimir Rodionov < > > >> [email protected] > > >> > > wrote: > > >> > > > >> >> >> It asks: "How do I figure what of backup/restore feature is > going > > to > > >> >> be in > > >> >> >>hbase-2.0.0? > > >> >> > > >> >> Hmm, wait for doc update. > > >> >> > > >> >> > > >> >> On Fri, Sep 8, 2017 at 2:39 PM, Stack <[email protected]> wrote: > > >> >> > > >> >>> HBASE-14414 is a JIRA with a list of random seeming issues w/ > > >> >>> non-descript > > >> >>> summaries: "Add nonce support to TableBackupProcedure, BackupID > must > > >> >>> include backup set name, ...". The last comment in that issue is > > from > > >> >>> July. > > >> >>> It asks: "How do I figure what of backup/restore feature is going > > to be > > >> >>> in > > >> >>> hbase-2.0.0? Thanks Vladimir Rodionov > > >> >>> <https://issues.apache.org/jira/secure/ViewProfile.jspa? > > name=vrodionov > > >> >>> >." > > >> >>> to which there is no answer. Doc update is TODO. > > >> >>> > > >> >>> Where is the summary of the capability in hbase-2? What testing > and > > at > > >> >>> what > > >> >>> scale has testing been done? Is this 'stable or experimental'? If > I > > >> can't > > >> >>> get basic info on this feature though I ask repeatedly, what hope > > does > > >> >>> the > > >> >>> poor old operator have? > > >> >>> > > >> >>> St.Ack > > >> >>> > > >> >>> > > >> >>> On Fri, Sep 8, 2017 at 1:59 PM, Vladimir Rodionov < > > >> >>> [email protected]> > > >> >>> wrote: > > >> >>> > > >> >>> > HBASE-14414 > > >> >>> > > > >> >>> > On Fri, Sep 8, 2017 at 1:14 PM, Stack <[email protected]> wrote: > > >> >>> > > > >> >>> > > Where do I go to get the current status of this feature? > > Looking in > > >> >>> JIRA > > >> >>> > I > > >> >>> > > see loads of issues open against backup including some against > > >> >>> > hbase-2.0.0 > > >> >>> > > and no progress being made that I can discern. > > >> >>> > > > > >> >>> > > Thanks, > > >> >>> > > S > > >> >>> > > > > >> >>> > > > > >> >>> > > > > >> >>> > > On Wed, Nov 23, 2016 at 8:52 AM, Stack <[email protected]> > > wrote: > > >> >>> > > > > >> >>> > > > On Tue, Nov 22, 2016 at 6:48 PM, Stack <[email protected]> > > wrote: > > >> >>> > > > > > >> >>> > > >> On Tue, Nov 22, 2016 at 3:17 PM, Vladimir Rodionov < > > >> >>> > > >> [email protected]> wrote: > > >> >>> > > >> > > >> >>> > > >>> >> and/or he answered most of the review feedback > > >> >>> > > >>> > > >> >>> > > >>> No, questions are still open, but I do not see any > blockers > > and > > >> >>> we > > >> >>> > have > > >> >>> > > >>> HBASE-16940 to address these questions. > > >> >>> > > >>> > > >> >>> > > >>> > > >> >>> > > >> Agree. No blockers but stuff that should be dealt with (No > > one > > >> >>> will > > >> >>> > pay > > >> >>> > > >> me any attention once merge goes in -- smile). > > >> >>> > > >> > > >> >>> > > >> > > >> >>> > > > Let me clarify the above. I want review addressed before > merge > > >> >>> happens. > > >> >>> > > > Sorry if any confusion. > > >> >>> > > > St.Ack > > >> >>> > > > > > >> >>> > > > > > >> >>> > > > > > >> >>> > > > > > >> >>> > > > > > >> >>> > > > > > >> >>> > > >> St.Ack > > >> >>> > > >> > > >> >>> > > >> > > >> >>> > > >> > > >> >>> > > >>> On Tue, Nov 22, 2016 at 3:04 PM, Devaraj Das < > > >> >>> [email protected]> > > >> >>> > > >>> wrote: > > >> >>> > > >>> > > >> >>> > > >>> > Hi Stack, hats off to you for spending so much time on > > this! > > >> >>> > Thanks! > > >> >>> > > >>> From > > >> >>> > > >>> > my understanding, Vlad has raised follow-up jiras for > the > > >> >>> issues > > >> >>> > you > > >> >>> > > >>> > raised, and/or he answered most of the review feedback. > > So, > > >> do > > >> >>> you > > >> >>> > > >>> think we > > >> >>> > > >>> > could do a merge vote now? > > >> >>> > > >>> > Devaraj. > > >> >>> > > >>> > ________________________________________ > > >> >>> > > >>> > From: Vladimir Rodionov <[email protected]> > > >> >>> > > >>> > Sent: Monday, November 21, 2016 8:34 PM > > >> >>> > > >>> > To: [email protected] > > >> >>> > > >>> > Subject: Re: [DISCUSSION] Merge Backup / Restore - > Branch > > >> >>> > HBASE-7912 > > >> >>> > > >>> > > > >> >>> > > >>> > >> I have spent a good bit of time reviewing and testing > > this > > >> >>> > > feature. > > >> >>> > > >>> I > > >> >>> > > >>> > would > > >> >>> > > >>> > >> like my review and concerns addressed and I'd like it > > to > > >> be > > >> >>> > clear > > >> >>> > > >>> how; > > >> >>> > > >>> > >> either explicit follow-on issues, pointers to where > in > > the > > >> >>> patch > > >> >>> > > or > > >> >>> > > >>> doc > > >> >>> > > >>> > my > > >> >>> > > >>> > >> remarks have been catered to, etc. Until then, I am > > >> against > > >> >>> > > commit. > > >> >>> > > >>> > > > >> >>> > > >>> > Stack, mega patch review comments will be addressed in > the > > >> >>> > dedicated > > >> >>> > > >>> JIRA: > > >> >>> > > >>> > HBASE-16940 > > >> >>> > > >>> > I have open several other JIRAs to address your other > > >> comments > > >> >>> (not > > >> >>> > > on > > >> >>> > > >>> > review board). > > >> >>> > > >>> > > > >> >>> > > >>> > Details are here (end of the thread): > > >> >>> > > >>> > https://issues.apache.org/jira/browse/HBASE-14123 > > >> >>> > > >>> > > > >> >>> > > >>> > Let me know what else should we do to move merge > forward. > > >> >>> > > >>> > > > >> >>> > > >>> > -Vlad > > >> >>> > > >>> > > > >> >>> > > >>> > > > >> >>> > > >>> > On Fri, Nov 18, 2016 at 4:54 PM, Stack < > [email protected]> > > >> >>> wrote: > > >> >>> > > >>> > > > >> >>> > > >>> > > On Fri, Nov 18, 2016 at 3:53 PM, Ted Yu < > > >> [email protected] > > >> >>> > > > >> >>> > > wrote: > > >> >>> > > >>> > > > > >> >>> > > >>> > > > Thanks, Matteo. > > >> >>> > > >>> > > > > > >> >>> > > >>> > > > bq. restore is not clear if given an incremental id > it > > >> >>> will do > > >> >>> > > the > > >> >>> > > >>> full > > >> >>> > > >>> > > > restore from full up to that point or if i need to > > apply > > >> >>> > manually > > >> >>> > > >>> > > > everything > > >> >>> > > >>> > > > > > >> >>> > > >>> > > > The restore takes into consideration of the > dependent > > >> >>> > backup(s). > > >> >>> > > >>> > > > So there is no need to apply preceding backup(s) > > >> manually. > > >> >>> > > >>> > > > > > >> >>> > > >>> > > > > > >> >>> > > >>> > > I ask this question on the issue. It is not clear from > > the > > >> >>> usage > > >> >>> > or > > >> >>> > > >>> doc > > >> >>> > > >>> > how > > >> >>> > > >>> > > to run a restore from incremental. Can you fix in doc > > and > > >> >>> usage > > >> >>> > how > > >> >>> > > >>> so I > > >> >>> > > >>> > > can be clear and try it. Currently I am stuck > verifying > > a > > >> >>> round > > >> >>> > > trip > > >> >>> > > >>> > backup > > >> >>> > > >>> > > restore made of incrementals. > > >> >>> > > >>> > > > > >> >>> > > >>> > > Thanks, > > >> >>> > > >>> > > S > > >> >>> > > >>> > > > > >> >>> > > >>> > > > > >> >>> > > >>> > > > > >> >>> > > >>> > > > On Fri, Nov 18, 2016 at 3:48 PM, Matteo Bertozzi < > > >> >>> > > >>> > > [email protected]> > > >> >>> > > >>> > > > wrote: > > >> >>> > > >>> > > > > > >> >>> > > >>> > > > > I did one last pass to the mega patch. I don't see > > >> >>> anything > > >> >>> > > major > > >> >>> > > >>> > that > > >> >>> > > >>> > > > > should block the merge. > > >> >>> > > >>> > > > > > > >> >>> > > >>> > > > > - most of the code is isolated in the backup > package > > >> >>> > > >>> > > > > - all the backup code is client side > > >> >>> > > >>> > > > > - there are few changes to the server side, mainly > > for > > >> >>> > > cleaners, > > >> >>> > > >>> wal > > >> >>> > > >>> > > > > rolling and similar (which is ok) > > >> >>> > > >>> > > > > - there is a good number of tests, and an > > integration > > >> >>> test > > >> >>> > > >>> > > > > > > >> >>> > > >>> > > > > the code seems to have still some left overs from > > the > > >> old > > >> >>> > > >>> > > implementation, > > >> >>> > > >>> > > > > and some stuff needs a cleanup. but I don't think > > this > > >> >>> should > > >> >>> > > be > > >> >>> > > >>> used > > >> >>> > > >>> > > as > > >> >>> > > >>> > > > an > > >> >>> > > >>> > > > > argument to block the merge. I think the guys will > > keep > > >> >>> > working > > >> >>> > > >>> on > > >> >>> > > >>> > this > > >> >>> > > >>> > > > and > > >> >>> > > >>> > > > > they may also get help of others once the patch is > > in > > >> >>> master. > > >> >>> > > >>> > > > > > > >> >>> > > >>> > > > > I still have my concerns about the current > > limitations, > > >> >>> but > > >> >>> > > >>> these are > > >> >>> > > >>> > > > > things already planned for phase 3, so some of > this > > >> >>> stuff may > > >> >>> > > >>> even be > > >> >>> > > >>> > > in > > >> >>> > > >>> > > > > the final 2.0. > > >> >>> > > >>> > > > > but as long as we have a "current limitations" > > section > > >> >>> in the > > >> >>> > > >>> user > > >> >>> > > >>> > > guide > > >> >>> > > >>> > > > > mentioning important stuff like the ones below, > I'm > > ok > > >> >>> with > > >> >>> > it. > > >> >>> > > >>> > > > > - if you write to the table with > > Durability.SKIP_WALS > > >> >>> your > > >> >>> > > data > > >> >>> > > >>> will > > >> >>> > > >>> > > not > > >> >>> > > >>> > > > > be in the incremental-backup > > >> >>> > > >>> > > > > - if you bulkload files that data will not be in > > the > > >> >>> > > incremental > > >> >>> > > >>> > > backup > > >> >>> > > >>> > > > > (HBASE-14417) > > >> >>> > > >>> > > > > - the incremental backup will not only contains > the > > >> >>> data of > > >> >>> > > the > > >> >>> > > >>> > table > > >> >>> > > >>> > > > you > > >> >>> > > >>> > > > > specified but also the regions from other tables > > that > > >> >>> are on > > >> >>> > > the > > >> >>> > > >>> same > > >> >>> > > >>> > > set > > >> >>> > > >>> > > > > of RSs (HBASE-14141) ...maybe a note about > security > > >> >>> around > > >> >>> > this > > >> >>> > > >>> topic > > >> >>> > > >>> > > > > - the incremental backup will not contains just > the > > >> >>> "latest > > >> >>> > > row" > > >> >>> > > >>> > > between > > >> >>> > > >>> > > > > backup A and B, but it will also contains all the > > >> updates > > >> >>> > > >>> occurred in > > >> >>> > > >>> > > > > between. but the restore does not allow you to > > restore > > >> >>> up to > > >> >>> > a > > >> >>> > > >>> > certain > > >> >>> > > >>> > > > > point in time, the restore will always be up to > the > > >> >>> "latest > > >> >>> > > >>> backup > > >> >>> > > >>> > > > point". > > >> >>> > > >>> > > > > - you should limit the number of "incremental" up > > to N > > >> >>> (or > > >> >>> > > maybe > > >> >>> > > >>> > > SIZE), > > >> >>> > > >>> > > > to > > >> >>> > > >>> > > > > avoid replay time becoming the bottleneck. > > >> (HBASE-14135) > > >> >>> > > >>> > > > > > > >> >>> > > >>> > > > > I'll be ok even with the above not being in the > > final > > >> >>> 2.0, > > >> >>> > > >>> > > > > but i'd like to see as blocker for the final 2.0 > > (not > > >> the > > >> >>> > > merge) > > >> >>> > > >>> > > > > - the backup code moved in an hbase-backup module > > >> >>> > > >>> > > > > - and some more work around tools, especially to > > try > > >> to > > >> >>> > unify > > >> >>> > > >>> and > > >> >>> > > >>> > make > > >> >>> > > >>> > > > > simple the backup experience (simple example: in > > some > > >> >>> case > > >> >>> > > there > > >> >>> > > >>> is a > > >> >>> > > >>> > > > > backup_id argument in others a backupId argument. > or > > >> >>> things > > >> >>> > > >>> like.. > > >> >>> > > >>> > > > restore > > >> >>> > > >>> > > > > is not clear if given an incremental id it will do > > the > > >> >>> full > > >> >>> > > >>> restore > > >> >>> > > >>> > > from > > >> >>> > > >>> > > > > full up to that point or if i need to apply > manually > > >> >>> > > everything). > > >> >>> > > >>> > > > > > > >> >>> > > >>> > > > > in conclusion, I think we can open a merge vote. > > I'll > > >> be > > >> >>> +1 > > >> >>> > on > > >> >>> > > >>> it, > > >> >>> > > >>> > and > > >> >>> > > >>> > > I > > >> >>> > > >>> > > > > think we should try to reject -1 with just a "code > > >> >>> cleanup" > > >> >>> > > >>> > motivation, > > >> >>> > > >>> > > > > since there will still be work going on on the > code > > >> >>> after the > > >> >>> > > >>> merge. > > >> >>> > > >>> > > > > > > >> >>> > > >>> > > > > Matteo > > >> >>> > > >>> > > > > > > >> >>> > > >>> > > > > > > >> >>> > > >>> > > > > On Sun, Nov 6, 2016 at 10:54 PM, Devaraj Das < > > >> >>> > > >>> [email protected]> > > >> >>> > > >>> > > > wrote: > > >> >>> > > >>> > > > > > > >> >>> > > >>> > > > > > Stack and others, anything else on the patch? > > Merge > > >> to > > >> >>> > master > > >> >>> > > >>> now? > > >> >>> > > >>> > > > > > > > >> >>> > > >>> > > > > > > >> >>> > > >>> > > > > > >> >>> > > >>> > > > > >> >>> > > >>> > > > >> >>> > > >>> > > >> >>> > > >> > > >> >>> > > >> > > >> >>> > > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >> > > >> >> > > >> > > > >> > > >
