If tests pass with the patch (which I believe they are), let's commit the patch. Follow it up with an updated mega patch for review...
________________________________________ From: Ted Yu <yuzhih...@gmail.com> Sent: Tuesday, October 04, 2016 6:28 PM To: dev@hbase.apache.org Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS) Refactoring work over in HBASE-16727 is ready for review. Kindly provide your feedback. Thanks On Mon, Oct 3, 2016 at 3:05 PM, Andrew Purtell <apurt...@apache.org> wrote: > This sounds good to me. > I'd be at least +0 as to merging the branch as long as we are not 'shelling > out' to MR from master. > > > All or most of the Backup/Restore operations (especially the MR job > spawns) should be moved to the client. > > We have a home grown backup solution at Salesforce that to a first order of > approximation is this. I would like to see something like this merged. > > > In the future, if someone needs to support self-service operations (any > user can take a backup/restore his/her tables), we can discuss the "backup > service" or something else. > > I can't commit the time of the team here (smile), but we always strive to > minimize the amount of local code we need to manage HBase. For example, we > use VerifyReplication and other tools that ship with HBase, and we have > contributed minor operational improvements as we've developed them (like > the region mover and canary stuff). I suspect we will have some adoption of > this tooling and further refinement insofar it fits into a backup workflow > at 30kft view using snapshots, replication (or file shipping), and WAL > replay. > > > On Mon, Sep 26, 2016 at 9:57 PM, Devaraj Das <d...@hortonworks.com> wrote: > > > Vlad, thinking about it a little more, since the master is not > > orchestrating the backup, let's make it dead simple as a first pass. I > > think we should do the following: All or most of the Backup/Restore > > operations (especially the MR job spawns) should be moved to the client. > > Ignore security for the moment - let's live with what we have as the > > current "limitation" for tools that need HDFS access - they need to run > as > > hbase (or whatever the hbase daemons runs as). Consistency/cleanup needs > to > > be handled as well as much as possible - if the client fails after > > initiating the backup/restore, who restores consistency in the > hbase:backup > > table, or cleans up the half copied data in the hdfs dirs, etc. > > In the future, if someone needs to support self-service operations (any > > user can take a backup/restore his/her tables), we can discuss the > "backup > > service" or something else. > > Folks - Stack / Andrew / Matteo / others, please speak up if you disagree > > with the above. Would like to get over this merge-to-master hump > obviously. > > > > ________________________________________ > > From: Vladimir Rodionov <vladrodio...@gmail.com> > > Sent: Monday, September 26, 2016 11:48 AM > > To: dev@hbase.apache.org > > Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs > > started by Master or RS) > > > > Ok, we had internal discussion and this is what we are suggesting now: > > > > 1. We will create separate module (hbase-backup) and move server-side > code > > there. > > 2. Master and RS will be MR and backup free. > > 3. The code from Master will be moved into standalone service > > (BackupService) for procedure orchestration, > > operation resume/abort and SECURITY. It means - one additional > > (process) similar to REST/Thrift server will be required > > to operate backup. > > > > I would like to note that separate process running under hbase super user > > is required to implement security properly in a multi-tenant environment, > > otherwise, only hbase super user will be allowed to operate backups > > > > Please let us know, what do you think, HBase people :? > > > > -Vlad > > > > > > > > On Sat, Sep 24, 2016 at 2:49 PM, Stack <st...@duboce.net> wrote: > > > > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell < > > andrew.purt...@gmail.com> > > > wrote: > > > > > > > At branch merge voting time now more eyes are getting on the design > > > issues > > > > with dissenting opinion emerging. This is the branch merge process > > > working > > > > as our community has designed it. Because this is the first full > > project > > > > review of the code and implementation I think we all have to be > > > flexible. I > > > > see the community as trying to narrow the technical objection at > issue > > to > > > > the smallest possible scope. It's simple: don't call out to an > external > > > > execution framework we don't own from core master (and by extension > > > > regionserver) code. We had this objection before to a proposed > external > > > > compaction implementation for > > > > MOB so should not come as a surprise. Please let me know if I have > > > > misstated this. > > > > > > > > > > > The above is my understanding also. > > > > > > > > > > This would seem to require a modest refactor of coordination to move > > > > invocation of MR code out from any core code path. To restate what I > > > think > > > > is an emerging recommendation: Move cross HBase and MR coordination > to > > a > > > > separate tool. This tool can ask the master to invoke procedures on > the > > > > HBase side that do first mile export and last mile restore. > (Internally > > > the > > > > tool can also use the procedure framework for state durability, > > perhaps, > > > > just a thought.) Then the tool can further drive the things done with > > MR > > > > like shipping data off cluster or moving remote data in place and > > > preparing > > > > it for import. These activities do not need procedure coordination > and > > > > involvement of the HBase master. Only the first and last mile of the > > > > process needs atomicity within the HBase deploy. Please let me know > if > > I > > > > have misstated this. > > > > > > > > > > > > Above is my understanding of our recommendation. > > > > > > St.Ack > > > > > > > > > > > > > > On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > > > > > bq. procedure gives you a retry mechanism on failure > > > > > > > > > > We do need this mechanism. Take a look at the multi-step > > > > > in FullTableBackupProcedure, etc. > > > > > > > > > > bq. let the user export it later when he wants > > > > > > > > > > This would make supporting security more complex (user A shouldn't > be > > > > > exporting user B's backup). And it is not user friendly - at the > time > > > > > backup request is issued, the following is specified: > > > > > > > > > > + + " BACKUP_ROOT The full root path to store the > backup > > > > > image,\n" > > > > > + + " the prefix can be hdfs, webhdfs or > > > gpfs\n" > > > > > > > > > > Backup root is an integral part of backup manifest. > > > > > > > > > > Cheers > > > > > > > > > > > > > > > On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi < > > > > theo.berto...@gmail.com> > > > > > wrote: > > > > > > > > > >>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhih...@gmail.com> > > wrote: > > > > >>> > > > > >>> Ideally the export should have one job running which does the > retry > > > (on > > > > >>> failed partition) itself. > > > > >>> > > > > >> > > > > >> procedure gives you a retry mechanism on failure. if you don't use > > > that, > > > > >> than you don't need procedure. > > > > >> if you want you can start a procedure executor in a non master > > process > > > > (the > > > > >> hbase-procedure is a separate package and does not depend on > > master). > > > > but > > > > >> again, export seems a case where you don't need procedure. > > > > >> > > > > >> like snapshot, the logic may just be: ask the master to take a > > backup. > > > > and > > > > >> let the user export it later when he wants. so you avoid having a > MR > > > job > > > > >> started by the master since people does not seems to like it. > > > > >> > > > > >> for restore (I think that is where you use the MR splitter) you > can > > > > >> probably just have a backup ready (already splitted). there is > > > already a > > > > >> jira that should do that HBASE-14135. instead of doing the > operation > > > of > > > > >> split/merge on restore. you consolidate the backup "offline" (mr > job > > > > >> started by the user) and then ask to restore the backup. > > > > >> > > > > >> > > > > >>> > > > > >>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi < > > > > >> theo.berto...@gmail.com> > > > > >>> wrote: > > > > >>> > > > > >>>> as far as I understand the code, you don't need procedure for > the > > > > >> export > > > > >>>> itself. > > > > >>>> the export operation is already idempotent, since you are just > > > copying > > > > >>>> files. > > > > >>>> if the file exist and is complete (check length, checksum, ...) > > you > > > > can > > > > >>>> skip it, > > > > >>>> otherwise you'll send it over again. > > > > >>>> > > > > >>>> you need the proc for taking the backup and restoring, > > > > >>>> because you want to complete the operation and end up with a > > > > consistent > > > > >>>> state > > > > >>>> across the multiple components you are updating (meta, fs, ...) > > > > >>>> but again, for export you can just run the tool over and over > > until > > > > the > > > > >>>> operation succeed, and that should be ok. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> Matteo > > > > >>>> > > > > >>>> > > > > >>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhih...@gmail.com> > > > wrote: > > > > >>>>> > > > > >>>>> Master is involved in this discussion because currently only > > Master > > > > >>>>> instantiates ProcedureExecutor which runs the 3 Procedures for > > > > >> backup / > > > > >>>>> restore. > > > > >>>>> > > > > >>>>> What if an optional standalone service which hosts > > > ProcedureExecutor > > > > >> is > > > > >>>>> used for this purpose ? > > > > >>>>> Would that have better chance of giving us middle ground so > that > > we > > > > >> can > > > > >>>>> move this forward ? > > > > >>>>> > > > > >>>>> Cheers > > > > >>>>> > > > > >>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <st...@duboce.net> > > wrote: > > > > >>>>>> > > > > >>>>>> (Moved out of the Master doing MR DISCUSSION) > > > > >>>>>> > > > > >>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov < > > > > >>>>>> vladrodio...@gmail.com> > > > > >>>>>> wrote: > > > > >>>>>> > > > > >>>>>>>>> -1 on that backup be in core hbase > > > > >>>>>>> > > > > >>>>>>> Not sure I understand what it means. > > > > >>>>>>> > > > > >>>>>>> Sorry for the imprecision. > > > > >>>>>> > > > > >>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a > > > dependency > > > > >>> and > > > > >>>>> so > > > > >>>>>> -1 on the Master running backup/restore MR jobs, even if > > optional. > > > > >>>>>> > > > > >>>>>> Master should not depend on MR. We've gone out of our way to > > avoid > > > > >>>> taking > > > > >>>>>> MR on as dependency in the past. Seems late in the game for us > > to > > > > >>>> change > > > > >>>>>> our opinion on this. If we didn't do it for distributed log > > > > >>> splitting, > > > > >>>> or > > > > >>>>>> MOB, why would we do it to support an optional backup/restore? > > > > >>>>>> > > > > >>>>>> I have opinions on the questions below -- i.e. that Master > > running > > > > >>>>>> backup/restore is outside of the Master's charge -- but they > are > > > > >> not > > > > >>>>> worth > > > > >>>>>> much since I've not done much by way of review or contrib to > > > > >>>>> backup/restore > > > > >>>>>> other than to try it as a 'user' so I'll keep them to myself > > until > > > > >> I > > > > >>>> do. > > > > >>>>> I > > > > >>>>>> only came out from under my shell to participate on the MR as > > > > >>>> dependency > > > > >>>>>> chat. > > > > >>>>>> > > > > >>>>>> Thanks, > > > > >>>>>> M > > > > >>>>>> > > > > >>>>>> > > > > >>>>>> 1. We are not allowed to use Master to orchestrate the whole > > > > >> process? > > > > >>>>>> > > > > >>>>>> > > > > >>>>>> We > > > > >>>>>>> have already brought up all advantages of using > > > > >>>>>>> Master and distributed procedures for backup and restore. > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> Downside of moving this to client tool is lack of fault > > > > >> tolerance: > > > > >>>>>>> 1.1 Client won't be allowed to do any operations, that can, > > > > >>>>> potentially > > > > >>>>>>> affect > > > > >>>>>>> cluster, such as disabling splits/merges, balancer. > > > > >>>>>>> 1.2 In case of client failure who will be doing the whole > > > > >> rollback > > > > >>>>>> stuff? > > > > >>>>>>> We are trying to make it atomic. > > > > >>>>>>> > > > > >>>>>>> Security is not clear. > > > > >>>>>> > > > > >>>>>> > > > > >>>>>> > > > > >>>>>> 2. We are not allowed to modify code of existing HBase core > > > classes > > > > >>>> (what > > > > >>>>>>> does core mean anyway)? > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>> > > > > >>>>>> > > > > >>>>>>> 3. We are not allowed to create backup system table > > > > >> (hbase:backup) > > > > >>>> in a > > > > >>>>>>> system space? Only in user space? The table is global. > > > > >>>>>>> > > > > >>>>>> > > > > >>>>>> > > > > >>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we > > > > >> have > > > > >>>>>> touched, > > > > >>>>>>> of course some existing HBase code. > > > > >>>>>>> 3. is not that critical, of course we can move backup system > > into > > > > >>>> user > > > > >>>>>>> space. > > > > >>>>>>> > > > > >>>>>>> And finally, will moving backup into external tool give us +1 > > > > >> from > > > > >>>>> stack? > > > > >>>>>>> > > > > >>>>>>> -Vlad > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <st...@duboce.net> > > > > >> wrote: > > > > >>>>>>> > > > > >>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov < > > > > >>>>>>>> vladrodio...@gmail.com> > > > > >>>>>>>> wrote: > > > > >>>>>>>> > > > > >>>>>>>>>>> + MR is dead > > > > >>>>>>>>> > > > > >>>>>>>>> Does MR know that? :) > > > > >>>>>>>>> > > > > >>>>>>>>> Again. With all due respect, stack - still no suggestions > > > > >> what > > > > >>>>> should > > > > >>>>>>> we > > > > >>>>>>>>> use for "bulk data move and transformation" instead of MR? > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> Use whatever distributed engine suits your fancy -- MR, > Spark, > > > > >>>>>>> distributed > > > > >>>>>>>> shell -- just don't have HBase core depend on it, even > > > > >>> optionally. > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>>> I suggest voting first on "do we need backup in HBase"? In > my > > > > >>>>>> opinion, > > > > >>>>>>>> some > > > > >>>>>>>>> group members still not sure about that and some will give > -1 > > > > >>>>>>>>> in any case. Just because ... > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>> We could run a vote, sure. -1 on that backup be in core > hbase > > > > >> (+1 > > > > >>>> on > > > > >>>>>>> adding > > > > >>>>>>>> all the API any such external tool might need to run). > > > > >>>>>>>> > > > > >>>>>>>> St.Ack > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>>> -Vlad > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <st...@duboce.net> > > > > >>>> wrote: > > > > >>>>>>>>> > > > > >>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi < > > > > >>>>>>>>> theo.berto...@gmail.com> > > > > >>>>>>>>>> wrote: > > > > >>>>>>>>>> > > > > >>>>>>>>>>> let me try to go back to my original topic. > > > > >>>>>>>>>>> this question was meant to be generic, and provide some > > > > >>> rule > > > > >>>>> for > > > > >>>>>>>> future > > > > >>>>>>>>>>> code. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> from what I can gather, a rule that may satisfy everyone > > > > >>> can > > > > >>>>> be: > > > > >>>>>>>>>>> - we don't want any core feature (e.g. > > > > >>>>> compaction/log-split/log- > > > > >>>>>>>>> reply) > > > > >>>>>>>>>>> over MR, because some cluster may not want or may have an > > > > >>>>>>>>>>> external/uncontrolled MR setup. > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>>> +1 > > > > >>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>>>> - we allow non-core features (e.g. features enabled by a > > > > >>>> flag) > > > > >>>>>> to > > > > >>>>>>>> run > > > > >>>>>>>>> MR > > > > >>>>>>>>>>> jobs from hbase, because unless you use the feature, MR > > > > >> is > > > > >>>> not > > > > >>>>>>>>> required. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind > > > > >> a > > > > >>>> flag > > > > >>>>>> or > > > > >>>>>>>> not > > > > >>>>>>>>> -- > > > > >>>>>>>>>> ever being able to launch MR jobs. > > > > >>>>>>>>>> > > > > >>>>>>>>>> + MR is dead. We should be busy working hard to undo it > > > > >> from > > > > >>>>>>>> hbase-server > > > > >>>>>>>>>> moving it out to be an optional module (Spark would be its > > > > >>>> peer). > > > > >>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and > Appy > > > > >>> are > > > > >>>>>> busy > > > > >>>>>>>>>> working hard on moving it up on to a new foundation. Lets > > > > >> not > > > > >>>>>> clutter > > > > >>>>>>>>> task > > > > >>>>>>>>>> harder by piling on more moving parts. > > > > >>>>>>>>>> > > > > >>>>>>>>>> St.Ack > > > > >>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>>>> Matteo > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < > > > > >>> yuzhih...@gmail.com > > > > >>>>> > > > > >>>>>>> wrote: > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> I suggest you look at Matteo's work for > > > > >> AssignmentManager > > > > >>>>> which > > > > >>>>>>> is > > > > >>>>>>>> to > > > > >>>>>>>>>>> make > > > > >>>>>>>>>>>> Master more stable. > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Cheers > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < > > > > >>> palomino...@gmail.com > > > > >>>>> > > > > >>>>>>> wrote: > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>>> No, not your fault, at lease, not this time:) > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the > > > > >>>>> sequence > > > > >>>>>>> of > > > > >>>>>>>>>> calls > > > > >>>>>>>>>>>> when > > > > >>>>>>>>>>>>> starting up the HMaster? HMaster is also a > > > > >> regionserver > > > > >>>> so > > > > >>>>> it > > > > >>>>>>>>> extends > > > > >>>>>>>>>>>>> HRegionServer, and the initialization of > > > > >> HRegionServer > > > > >>>>>>> sometimes > > > > >>>>>>>>>> needs > > > > >>>>>>>>>>> to > > > > >>>>>>>>>>>>> make rpc calls to HMaster. A simple change would > > > > >> cause > > > > >>>>>>>>> probabilistic > > > > >>>>>>>>>>> dead > > > > >>>>>>>>>>>>> lock or some strange NPEs... > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> That's why I'm very nervous when somebody wants to > > > > >> add > > > > >>>> new > > > > >>>>>>>> features > > > > >>>>>>>>>> or > > > > >>>>>>>>>>>> add > > > > >>>>>>>>>>>>> external dependencies to HMaster, especially add more > > > > >>>> works > > > > >>>>>> for > > > > >>>>>>>> the > > > > >>>>>>>>>>> start > > > > >>>>>>>>>>>>> up processing... > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> Thanks. > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu < > > > > >> yuzhih...@gmail.com > > > > >>>> : > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> I read through HADOOP-13433 > > > > >>>>>>>>>>>>>> <https://issues.apache.org/ > > > > >> jira/browse/HADOOP-13433> > > > > >>> - > > > > >>>>> the > > > > >>>>>>>> cited > > > > >>>>>>>>>>> race > > > > >>>>>>>>>>>>>> condition is in jdk. > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it > > > > >>> moving. > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a > > > > >>>> problem... > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is > > > > >> it > > > > >>> in > > > > >>>>> the > > > > >>>>>>>>> backup > > > > >>>>>>>>>> / > > > > >>>>>>>>>>>>>> restore mega patch ? > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Cheers > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 < > > > > >>>>>> palomino...@gmail.com> > > > > >>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> If you guys have already implemented the feature > > > > >> in > > > > >>>> the > > > > >>>>>> MR > > > > >>>>>>>> way > > > > >>>>>>>>>> and > > > > >>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on > > > > >>> it > > > > >>>>> as I > > > > >>>>>>> do > > > > >>>>>>>>> not > > > > >>>>>>>>>>> want > > > > >>>>>>>>>>>>> to > > > > >>>>>>>>>>>>>>> block the development progress. > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> But I strongly suggest later we need to revisit > > > > >> the > > > > >>>>>> design > > > > >>>>>>>> and > > > > >>>>>>>>>> see > > > > >>>>>>>>>>> if > > > > >>>>>>>>>>>>> we > > > > >>>>>>>>>>>>>>> can seperated the logic from HMaster as much as > > > > >>>>> possible. > > > > >>>>>>> HA > > > > >>>>>>>> is > > > > >>>>>>>>>>> not a > > > > >>>>>>>>>>>>> big > > > > >>>>>>>>>>>>>>> problem if you do not store any metada locally. > > > > >> But > > > > >>>> the > > > > >>>>>>> ugly > > > > >>>>>>>>> code > > > > >>>>>>>>>>> in > > > > >>>>>>>>>>>>>>> HMaster is readlly a problem... > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> And for security, I have a issue pending for a > > > > >> long > > > > >>>>> time. > > > > >>>>>>> Can > > > > >>>>>>>>>>> someone > > > > >>>>>>>>>>>>>> help > > > > >>>>>>>>>>>>>>> taking a simple look at it? This is what I mean, > > > > >>> ugly > > > > >>>>>>> code... > > > > >>>>>>>>>>> logout > > > > >>>>>>>>>>>>> and > > > > >>>>>>>>>>>>>>> destroy the credentials in a subject when it is > > > > >>> still > > > > >>>>>> being > > > > >>>>>>>>> used, > > > > >>>>>>>>>>> and > > > > >>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the > > > > >>>>> behivor > > > > >>>>>>> and > > > > >>>>>>>>> the > > > > >>>>>>>>>>> only > > > > >>>>>>>>>>>>> way > > > > >>>>>>>>>>>>>>> to fix it is to write another piece of ugly > > > > >> code... > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> https://issues.apache.org/ > > > > >> jira/browse/HADOOP-13433 > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < > > > > >>>>>>>>>>> vladrodio...@gmail.com > > > > >>>>>>>>>>>>> : > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> If in the future, we find better ways of > > > > >> doing > > > > >>>>> this > > > > >>>>>>>>> without > > > > >>>>>>>>>>>> using > > > > >>>>>>>>>>>>>> MR, > > > > >>>>>>>>>>>>>>> we > > > > >>>>>>>>>>>>>>>> can certainly consider that > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> Our framework for distributed operations is > > > > >>>> abstract > > > > >>>>>> and > > > > >>>>>>>>> allows > > > > >>>>>>>>>>>>>>>> different implementations. MR is just one > > > > >>>>>> implementation > > > > >>>>>>> we > > > > >>>>>>>>>>>> provide. > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> -Vlad > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > > > > >>>>>>>>>>> d...@hortonworks.com > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the > > > > >>>> topic > > > > >>>>>> of > > > > >>>>>>>>>> MR-based > > > > >>>>>>>>>>>>>>>>> compactions.. But I was thinking more about > > > > >> the > > > > >>>>>>>>> SpliceMachine > > > > >>>>>>>>>>>>>> approach > > > > >>>>>>>>>>>>>>> of > > > > >>>>>>>>>>>>>>>>> managing compactions in Spark where > > > > >> apparently > > > > >>>> they > > > > >>>>>>> saw a > > > > >>>>>>>>> lot > > > > >>>>>>>>>>> of > > > > >>>>>>>>>>>>>>>> benefits. > > > > >>>>>>>>>>>>>>>>> Apologies for giving you that sore throat > > > > >>>> Andrew; I > > > > >>>>>>>> really > > > > >>>>>>>>>>> didn't > > > > >>>>>>>>>>>>>> mean > > > > >>>>>>>>>>>>>>> to > > > > >>>>>>>>>>>>>>>>> :-) > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> So on this issue, we have these on the plate: > > > > >>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that > > > > >>>>>>>>>>>>>>>>> 1. Run a standalone service other than master > > > > >>>>>>>>>>>>>>>>> 2. Shell out from the master > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> I don't think we have a good answer to (0), > > > > >>> and I > > > > >>>>>> don't > > > > >>>>>>>>> think > > > > >>>>>>>>>>>> it's > > > > >>>>>>>>>>>>>> even > > > > >>>>>>>>>>>>>>>>> worth the effort of trying to build something > > > > >>>> when > > > > >>>>> MR > > > > >>>>>>> is > > > > >>>>>>>>>>> already > > > > >>>>>>>>>>>>>> there, > > > > >>>>>>>>>>>>>>>> and > > > > >>>>>>>>>>>>>>>>> being used by HBase already for some > > > > >>> operations. > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of > > > > >>> issues - > > > > >>>>> HA > > > > >>>>>> of > > > > >>>>>>>> the > > > > >>>>>>>>>>>> server > > > > >>>>>>>>>>>>>> not > > > > >>>>>>>>>>>>>>>>> being the least of them all. Security > > > > >> (kerberos > > > > >>>>>>>>>> authentication, > > > > >>>>>>>>>>>>>> another > > > > >>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that > > > > >>>>> approach > > > > >>>>>>> is > > > > >>>>>>>>> DOA. > > > > >>>>>>>>>>>>> Instead > > > > >>>>>>>>>>>>>>>> let's > > > > >>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I > > > > >>>>> haven't > > > > >>>>>>> seen > > > > >>>>>>>>> any > > > > >>>>>>>>>>>> good > > > > >>>>>>>>>>>>>>> reason > > > > >>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs > > > > >>> if > > > > >>>>>>> needed. > > > > >>>>>>>>> It's > > > > >>>>>>>>>>> not > > > > >>>>>>>>>>>>>>> ideal; > > > > >>>>>>>>>>>>>>>>> agreed. > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> Now before going to (2), let's see what are > > > > >> the > > > > >>>>>>> benefits > > > > >>>>>>>> of > > > > >>>>>>>>>>>> running > > > > >>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think > > > > >>> Ted > > > > >>>>> has > > > > >>>>>>>>>> summarized > > > > >>>>>>>>>>>>> some > > > > >>>>>>>>>>>>>> of > > > > >>>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>> issues that we need to take care of - > > > > >>> basically, > > > > >>>>> the > > > > >>>>>>>> master > > > > >>>>>>>>>> can > > > > >>>>>>>>>>>>> keep > > > > >>>>>>>>>>>>>>>> track > > > > >>>>>>>>>>>>>>>>> of running jobs, and should it fail, the > > > > >> backup > > > > >>>>>> master > > > > >>>>>>>> can > > > > >>>>>>>>>>>> continue > > > > >>>>>>>>>>>>>>>> keeping > > > > >>>>>>>>>>>>>>>>> track of it (since the jobId would have been > > > > >>>>> recorded > > > > >>>>>>> in > > > > >>>>>>>>> the > > > > >>>>>>>>>>> proc > > > > >>>>>>>>>>>>>> WAL). > > > > >>>>>>>>>>>>>>>> The > > > > >>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed > > > > >>>>>>> backup/restore > > > > >>>>>>>>>>>>> processes. > > > > >>>>>>>>>>>>>>>>> Security is another issue - the job needs to > > > > >>> run > > > > >>>> as > > > > >>>>>>>> 'hbase' > > > > >>>>>>>>>>> since > > > > >>>>>>>>>>>>> it > > > > >>>>>>>>>>>>>>> owns > > > > >>>>>>>>>>>>>>>>> the data. Having the master launch the job > > > > >>> makes > > > > >>>> it > > > > >>>>>> get > > > > >>>>>>>>> that > > > > >>>>>>>>>>>>>> privilege. > > > > >>>>>>>>>>>>>>>> In > > > > >>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the > > > > >>>> above > > > > >>>>>>>>>> management. > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is > > > > >>>> ready > > > > >>>>>>> from > > > > >>>>>>>>> the > > > > >>>>>>>>>>>>> overall > > > > >>>>>>>>>>>>>>>>> design/arch point of view (maybe code review > > > > >> is > > > > >>>>> still > > > > >>>>>>>>> pending > > > > >>>>>>>>>>>> from > > > > >>>>>>>>>>>>>>>> Matteo). > > > > >>>>>>>>>>>>>>>>> If in the future, we find better ways of > > > > >> doing > > > > >>>> this > > > > >>>>>>>> without > > > > >>>>>>>>>>> using > > > > >>>>>>>>>>>>> MR, > > > > >>>>>>>>>>>>>>> we > > > > >>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't > > > > >>> think > > > > >>>> we > > > > >>>>>>>> should > > > > >>>>>>>>>>> block > > > > >>>>>>>>>>>>> this > > > > >>>>>>>>>>>>>>>> patch > > > > >>>>>>>>>>>>>>>>> from getting merged. > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> ________________________________________ > > > > >>>>>>>>>>>>>>>>> From: 张铎 <palomino...@gmail.com> > > > > >>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM > > > > >>>>>>>>>>>>>>>>> To: dev@hbase.apache.org > > > > >>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by > > > > >>>> Master > > > > >>>>>> or > > > > >>>>>>> RS > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> So what about a standalone service other than > > > > >>>>> master? > > > > >>>>>>> You > > > > >>>>>>>>> can > > > > >>>>>>>>>>> use > > > > >>>>>>>>>>>>>> your > > > > >>>>>>>>>>>>>>>> own > > > > >>>>>>>>>>>>>>>>> procedure store in that service? > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu < > > > > >>>>>> yuzhih...@gmail.com > > > > >>>>>>>> : > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> An earlier implementation was client > > > > >> driven. > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> But with that approach, it is hard to > > > > >> resume > > > > >>> if > > > > >>>>>> there > > > > >>>>>>>> is > > > > >>>>>>>>>>> error > > > > >>>>>>>>>>>>>>> midway. > > > > >>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup / > > > > >> restore > > > > >>>>> more > > > > >>>>>>>>> robust. > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> Another consideration is for security. It > > > > >> is > > > > >>>> hard > > > > >>>>>> to > > > > >>>>>>>>>> enforce > > > > >>>>>>>>>>>>>> security > > > > >>>>>>>>>>>>>>>> (to > > > > >>>>>>>>>>>>>>>>>> be implemented) for client driven actions. > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> Cheers > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew > > > > >>> Purtell < > > > > >>>>>>>>>>>>>>>> andrew.purt...@gmail.com> > > > > >>>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point, > > > > >> which > > > > >>>> is > > > > >>>>>>>>> "shelling > > > > >>>>>>>>>>> out" > > > > >>>>>>>>>>>>>> from > > > > >>>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why > > > > >> not > > > > >>>>> drive > > > > >>>>>>>> this > > > > >>>>>>>>>>> with a > > > > >>>>>>>>>>>>>>> utility > > > > >>>>>>>>>>>>>>>>>> derived from Tool? > > > > >>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir > > > > >>>> Rodionov > > > > >>>>> < > > > > >>>>>>>>>>>>>>>> vladrodio...@gmail.com > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a > > > > >>> common > > > > >>>>>> case > > > > >>>>>>> we > > > > >>>>>>>>>> just > > > > >>>>>>>>>>>> have > > > > >>>>>>>>>>>>>>> HDFS > > > > >>>>>>>>>>>>>>>>> and > > > > >>>>>>>>>>>>>>>>>>>>>> HBase deployed. > > > > >>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR > > > > >> framework > > > > >>>>>>>> (especially > > > > >>>>>>>>>> some > > > > >>>>>>>>>>>>>>> features > > > > >>>>>>>>>>>>>>>> we > > > > >>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced > > > > >>>>> another > > > > >>>>>>> cost > > > > >>>>>>>>> for > > > > >>>>>>>>>>>>>> maintain. > > > > >>>>>>>>>>>>>>>> I > > > > >>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. > > > > >>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>> So , you are not backup users in this > > > > >>> case. > > > > >>>>> Many > > > > >>>>>>> our > > > > >>>>>>>>>>>> customers > > > > >>>>>>>>>>>>>>> have > > > > >>>>>>>>>>>>>>>>> full > > > > >>>>>>>>>>>>>>>>>>>> stack deployed and > > > > >>>>>>>>>>>>>>>>>>>> want see backup to be a standard > > > > >> feature. > > > > >>>>>> Besides > > > > >>>>>>>>> this, > > > > >>>>>>>>>>>>> nothing > > > > >>>>>>>>>>>>>>> will > > > > >>>>>>>>>>>>>>>>>> happen > > > > >>>>>>>>>>>>>>>>>>>> in your cluster > > > > >>>>>>>>>>>>>>>>>>>> if you won't be doing backups. > > > > >>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R > > > > >>>>>>> dependency) > > > > >>>>>>>>> goes > > > > >>>>>>>>>>> to > > > > >>>>>>>>>>>>>>> nowhere. > > > > >>>>>>>>>>>>>>>>> We > > > > >>>>>>>>>>>>>>>>>>>> asked already, at least twice, to > > > > >> suggest > > > > >>>>>> another > > > > >>>>>>>>>>> framework > > > > >>>>>>>>>>>>>> (other > > > > >>>>>>>>>>>>>>>>> than > > > > >>>>>>>>>>>>>>>>>> M/R) > > > > >>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*. > > > > >>> Still > > > > >>>>>>> waiting > > > > >>>>>>>>> for > > > > >>>>>>>>>>>>>>> suggestions. > > > > >>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>> -Vlad > > > > >>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted > > > > >> Yu < > > > > >>>>>>>>>>>> yuzhih...@gmail.com > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the > > > > >>>>> cluster, > > > > >>>>>>>> hbase > > > > >>>>>>>>>>> still > > > > >>>>>>>>>>>>>>>> functions > > > > >>>>>>>>>>>>>>>>>>>>> normally (post merge). > > > > >>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we > > > > >>> have > > > > >>>>> long > > > > >>>>>>>> been > > > > >>>>>>>>>>>>> depending > > > > >>>>>>>>>>>>>> on > > > > >>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at > > > > >> ExportSnapshot. > > > > >>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>> Cheers > > > > >>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng > > > > >>> Chen > > > > >>>> < > > > > >>>>>>>>>>>>>>>> heng.chen.1...@gmail.com > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a > > > > >>> common > > > > >>>>>> case > > > > >>>>>>> we > > > > >>>>>>>>>> just > > > > >>>>>>>>>>>> have > > > > >>>>>>>>>>>>>>> HDFS > > > > >>>>>>>>>>>>>>>>> and > > > > >>>>>>>>>>>>>>>>>>>>>> HBase deployed. > > > > >>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR > > > > >> framework > > > > >>>>>>>> (especially > > > > >>>>>>>>>> some > > > > >>>>>>>>>>>>>>> features > > > > >>>>>>>>>>>>>>>> we > > > > >>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced > > > > >>>>> another > > > > >>>>>>> cost > > > > >>>>>>>>> for > > > > >>>>>>>>>>>>>> maintain. > > > > >>>>>>>>>>>>>>>> I > > > > >>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 < > > > > >>>>>>>>> palomino...@gmail.com > > > > >>>>>>>>>>> : > > > > >>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice > > > > >>>>>>>>> Backup/Restore > > > > >>>>>>>>>>>>> feature, > > > > >>>>>>>>>>>>>>> if > > > > >>>>>>>>>>>>>>>> we > > > > >>>>>>>>>>>>>>>>>>>>> think > > > > >>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase, > > > > >>> then > > > > >>>>> we > > > > >>>>>>>> could > > > > >>>>>>>>>> make > > > > >>>>>>>>>>>> it > > > > >>>>>>>>>>>>>>> depend > > > > >>>>>>>>>>>>>>>>> on > > > > >>>>>>>>>>>>>>>>>>>>> MR, > > > > >>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager > > > > >>>>> instance > > > > >>>>>>>> that > > > > >>>>>>>>>>>> submits > > > > >>>>>>>>>>>>> MR > > > > >>>>>>>>>>>>>>>> jobs > > > > >>>>>>>>>>>>>>>>> to > > > > >>>>>>>>>>>>>>>>>>>>> do > > > > >>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we > > > > >>>> think > > > > >>>>>>> this > > > > >>>>>>>>> is a > > > > >>>>>>>>>>>> core > > > > >>>>>>>>>>>>>>>> feature > > > > >>>>>>>>>>>>>>>>>> that > > > > >>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd > > > > >>> better > > > > >>>>>>>> implement > > > > >>>>>>>>> it > > > > >>>>>>>>>>>>> without > > > > >>>>>>>>>>>>>>> MR > > > > >>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS. > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>> Thanks. > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 < > > > > >>>>>>>>> palomino...@gmail.com > > > > >>>>>>>>>>> : > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR > > > > >>>> jobs. > > > > >>>>>> It > > > > >>>>>>> is > > > > >>>>>>>>> OK > > > > >>>>>>>>>>> that > > > > >>>>>>>>>>>>>> some > > > > >>>>>>>>>>>>>>> of > > > > >>>>>>>>>>>>>>>>> our > > > > >>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think > > > > >> the > > > > >>>>> bottom > > > > >>>>>>>> line > > > > >>>>>>>>> is > > > > >>>>>>>>>>>> that > > > > >>>>>>>>>>>>> we > > > > >>>>>>>>>>>>>>>>> should > > > > >>>>>>>>>>>>>>>>>>>>>> launch > > > > >>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by > > > > >>>> other > > > > >>>>>>>>> services. > > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew > > > > >>>> Purtell < > > > > >>>>>>>>>>>>>>>>> andrew.purt...@gmail.com > > > > >>>>>>>>>>>>>>>>>>> : > > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is > > > > >> on > > > > >>>> the > > > > >>>>>>> line > > > > >>>>>>>> I > > > > >>>>>>>>>>> think, > > > > >>>>>>>>>>>>> so > > > > >>>>>>>>>>>>>> a > > > > >>>>>>>>>>>>>>>> fair > > > > >>>>>>>>>>>>>>>>>>>>>>>>> question. > > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility > > > > >>> derived > > > > >>>>>> from > > > > >>>>>>>> Tool > > > > >>>>>>>>>>> like > > > > >>>>>>>>>>>>> our > > > > >>>>>>>>>>>>>>>> other > > > > >>>>>>>>>>>>>>>>> MR > > > > >>>>>>>>>>>>>>>>>>>>>> apps? > > > > >>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the > > > > >>>> AccessController > > > > >>>>>> to > > > > >>>>>>>>> decide > > > > >>>>>>>>>>> if > > > > >>>>>>>>>>>>>>> allowed? > > > > >>>>>>>>>>>>>>>>> But > > > > >>>>>>>>>>>>>>>>>>>>>> nothing > > > > >>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the > > > > >>> job > > > > >>>>>>>>>>>>>>> manually/independently, > > > > >>>>>>>>>>>>>>>>>> right? > > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM, > > > > >> Matteo > > > > >>>>>>> Bertozzi < > > > > >>>>>>>>>>>>>>>>>>>>>> theo.berto...@gmail.com> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not > > > > >>> about > > > > >>>>>> tools > > > > >>>>>>>>> using > > > > >>>>>>>>>> MR > > > > >>>>>>>>>>>>>>>> (everyone i > > > > >>>>>>>>>>>>>>>>>>>>>> think > > > > >>>>>>>>>>>>>>>>>>>>>>>>> is > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> ok with those). > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok > > > > >> with > > > > >>>>>> running > > > > >>>>>>>> MR > > > > >>>>>>>>>> jobs > > > > >>>>>>>>>>>>> from > > > > >>>>>>>>>>>>>>>> Master > > > > >>>>>>>>>>>>>>>>>>>>> and > > > > >>>>>>>>>>>>>>>>>>>>>> RSs > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the > > > > >> first > > > > >>>> time > > > > >>>>>> we > > > > >>>>>>> do > > > > >>>>>>>>>> this > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Matteo > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, > > > > >>>>> Devaraj > > > > >>>>>>> Das > > > > >>>>>>>> < > > > > >>>>>>>>>>>>>>>>>>>>> d...@hortonworks.com> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like > > > > >>>>>>>> ExportSnapshot > > > > >>>>>>>>> / > > > > >>>>>>>>>>>>> Backup / > > > > >>>>>>>>>>>>>>>>>> Restore, > > > > >>>>>>>>>>>>>>>>>>>>>> it's > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is > > > > >>> the > > > > >>>>>> right > > > > >>>>>>>>>>> framework > > > > >>>>>>>>>>>>> for > > > > >>>>>>>>>>>>>>>> such. > > > > >>>>>>>>>>>>>>>>>> We > > > > >>>>>>>>>>>>>>>>>>>>>>>>> should > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR > > > > >> (just > > > > >>>>> saying > > > > >>>>>>> :) > > > > >>>>>>>> ) > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________ > > > > >>>>> __________ > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu < > > > > >> yuzhih...@gmail.com> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, > > > > >> 2016 > > > > >>>> 2:00 > > > > >>>>>> PM > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs > > > > >>>>> started > > > > >>>>>>> by > > > > >>>>>>>>>> Master > > > > >>>>>>>>>>>> or > > > > >>>>>>>>>>>>> RS > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in > > > > >>> the > > > > >>>>> same > > > > >>>>>>>>>> category > > > > >>>>>>>>>>> as > > > > >>>>>>>>>>>>>>> import > > > > >>>>>>>>>>>>>>>> / > > > > >>>>>>>>>>>>>>>>>>>>>> export. > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, > > > > >>>> Andrew > > > > >>>>>>>>> Purtell < > > > > >>>>>>>>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around > > > > >>> core > > > > >>>> in > > > > >>>>>> my > > > > >>>>>>>>>> opinion. > > > > >>>>>>>>>>>>> Like > > > > >>>>>>>>>>>>>>>> import > > > > >>>>>>>>>>>>>>>>>> or > > > > >>>>>>>>>>>>>>>>>>>>>>>>> export. > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's > > > > >>> fine. > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, > > > > >>> Matteo > > > > >>>>>>>> Bertozzi > > > > >>>>>>>>> < > > > > >>>>>>>>>>>>>>>>>>>>>> mberto...@apache.org> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion > > > > >> around > > > > >>>>>> running > > > > >>>>>>> MR > > > > >>>>>>>>>> jobs > > > > >>>>>>>>>>>> from > > > > >>>>>>>>>>>>>>> hbase > > > > >>>>>>>>>>>>>>>>>>>>>> (Master > > > > >>>>>>>>>>>>>>>>>>>>>>>>> or > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)? > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that > > > > >> there > > > > >>>> was > > > > >>>>>>>>>> discussion > > > > >>>>>>>>>>>>> about > > > > >>>>>>>>>>>>>>> not > > > > >>>>>>>>>>>>>>>>>>>>> having > > > > >>>>>>>>>>>>>>>>>>>>>> MR > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> has > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase. > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion > > > > >> where > > > > >>>>> around > > > > >>>>>>> MOB > > > > >>>>>>>>>> that > > > > >>>>>>>>>>>> had > > > > >>>>>>>>>>>>> a > > > > >>>>>>>>>>>>>> MR > > > > >>>>>>>>>>>>>>>> job > > > > >>>>>>>>>>>>>>>>>> to > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> compact, > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a > > > > >>>>> non-MR > > > > >>>>>>> job > > > > >>>>>>>> to > > > > >>>>>>>>>> be > > > > >>>>>>>>>>>>>> merged, > > > > >>>>>>>>>>>>>>> I > > > > >>>>>>>>>>>>>>>>>> think > > > > >>>>>>>>>>>>>>>>>>>>>> we > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> had a > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log > > > > >>>>>> split/replay. > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup > > > > >>>> feature > > > > >>>>>>>>>>> (HBASE-7912), > > > > >>>>>>>>>>>>> that > > > > >>>>>>>>>>>>>>>> runs > > > > >>>>>>>>>>>>>>>>> a > > > > >>>>>>>>>>>>>>>>>>>>> MR > > > > >>>>>>>>>>>>>>>>>>>>>> job > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> from > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or > > > > >>> restore > > > > >>>>>> data. > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really > > > > >> core" > > > > >>>> as > > > > >>>>>> in.. > > > > >>>>>>>> if > > > > >>>>>>>>>> you > > > > >>>>>>>>>>>>> don't > > > > >>>>>>>>>>>>>>> use > > > > >>>>>>>>>>>>>>>>>>>>> backup > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> you'll > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but > > > > >>>> this > > > > >>>>>> was > > > > >>>>>>>>>> probably > > > > >>>>>>>>>>>>> true > > > > >>>>>>>>>>>>>>> for > > > > >>>>>>>>>>>>>>>>> MOB > > > > >>>>>>>>>>>>>>>>>>>>> as > > > > >>>>>>>>>>>>>>>>>>>>>> in > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> "if > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't > > > > >>> need > > > > >>>>>> MR") > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that > > > > >>>> says > > > > >>>>>> "we > > > > >>>>>>>>> don't > > > > >>>>>>>>>>> want > > > > >>>>>>>>>>>>> to > > > > >>>>>>>>>>>>>>> have > > > > >>>>>>>>>>>>>>>>>>>>> hbase > > > > >>>>>>>>>>>>>>>>>>>>>> run > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> MR > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started > > > > >> manually > > > > >>> by > > > > >>>>> the > > > > >>>>>>>> user > > > > >>>>>>>>>> can > > > > >>>>>>>>>>> do > > > > >>>>>>>>>>>>>>> that". > > > > >>>>>>>>>>>>>>>> or > > > > >>>>>>>>>>>>>>>>>>>>> can > > > > >>>>>>>>>>>>>>>>>>>>>> we > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> start > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without > > > > >>>>>> problems? > > > > >>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>>> > > > > >>> > > > > >> > > > > > > > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >