I ran backup test suite on Linux. They passed - took 28 minutes.
> On Oct 5, 2016, at 3:18 PM, Devaraj Das <d...@hortonworks.com> wrote: > > If tests pass with the patch (which I believe they are), let's commit the > patch. Follow it up with an updated mega patch for review... > > ________________________________________ > From: Ted Yu <yuzhih...@gmail.com> > Sent: Tuesday, October 04, 2016 6:28 PM > To: dev@hbase.apache.org > Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started > by Master or RS) > > Refactoring work over in HBASE-16727 is ready for review. > > Kindly provide your feedback. > > Thanks > >> On Mon, Oct 3, 2016 at 3:05 PM, Andrew Purtell <apurt...@apache.org> wrote: >> >> This sounds good to me. >> I'd be at least +0 as to merging the branch as long as we are not 'shelling >> out' to MR from master. >> >>> All or most of the Backup/Restore operations (especially the MR job >> spawns) should be moved to the client. >> >> We have a home grown backup solution at Salesforce that to a first order of >> approximation is this. I would like to see something like this merged. >> >>> In the future, if someone needs to support self-service operations (any >> user can take a backup/restore his/her tables), we can discuss the "backup >> service" or something else. >> >> I can't commit the time of the team here (smile), but we always strive to >> minimize the amount of local code we need to manage HBase. For example, we >> use VerifyReplication and other tools that ship with HBase, and we have >> contributed minor operational improvements as we've developed them (like >> the region mover and canary stuff). I suspect we will have some adoption of >> this tooling and further refinement insofar it fits into a backup workflow >> at 30kft view using snapshots, replication (or file shipping), and WAL >> replay. >> >> >>> On Mon, Sep 26, 2016 at 9:57 PM, Devaraj Das <d...@hortonworks.com> wrote: >>> >>> Vlad, thinking about it a little more, since the master is not >>> orchestrating the backup, let's make it dead simple as a first pass. I >>> think we should do the following: All or most of the Backup/Restore >>> operations (especially the MR job spawns) should be moved to the client. >>> Ignore security for the moment - let's live with what we have as the >>> current "limitation" for tools that need HDFS access - they need to run >> as >>> hbase (or whatever the hbase daemons runs as). Consistency/cleanup needs >> to >>> be handled as well as much as possible - if the client fails after >>> initiating the backup/restore, who restores consistency in the >> hbase:backup >>> table, or cleans up the half copied data in the hdfs dirs, etc. >>> In the future, if someone needs to support self-service operations (any >>> user can take a backup/restore his/her tables), we can discuss the >> "backup >>> service" or something else. >>> Folks - Stack / Andrew / Matteo / others, please speak up if you disagree >>> with the above. Would like to get over this merge-to-master hump >> obviously. >>> >>> ________________________________________ >>> From: Vladimir Rodionov <vladrodio...@gmail.com> >>> Sent: Monday, September 26, 2016 11:48 AM >>> To: dev@hbase.apache.org >>> Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs >>> started by Master or RS) >>> >>> Ok, we had internal discussion and this is what we are suggesting now: >>> >>> 1. We will create separate module (hbase-backup) and move server-side >> code >>> there. >>> 2. Master and RS will be MR and backup free. >>> 3. The code from Master will be moved into standalone service >>> (BackupService) for procedure orchestration, >>> operation resume/abort and SECURITY. It means - one additional >>> (process) similar to REST/Thrift server will be required >>> to operate backup. >>> >>> I would like to note that separate process running under hbase super user >>> is required to implement security properly in a multi-tenant environment, >>> otherwise, only hbase super user will be allowed to operate backups >>> >>> Please let us know, what do you think, HBase people :? >>> >>> -Vlad >>> >>> >>> >>>> On Sat, Sep 24, 2016 at 2:49 PM, Stack <st...@duboce.net> wrote: >>>> >>>> On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell < >>> andrew.purt...@gmail.com> >>>> wrote: >>>> >>>>> At branch merge voting time now more eyes are getting on the design >>>> issues >>>>> with dissenting opinion emerging. This is the branch merge process >>>> working >>>>> as our community has designed it. Because this is the first full >>> project >>>>> review of the code and implementation I think we all have to be >>>> flexible. I >>>>> see the community as trying to narrow the technical objection at >> issue >>> to >>>>> the smallest possible scope. It's simple: don't call out to an >> external >>>>> execution framework we don't own from core master (and by extension >>>>> regionserver) code. We had this objection before to a proposed >> external >>>>> compaction implementation for >>>>> MOB so should not come as a surprise. Please let me know if I have >>>>> misstated this. >>>> The above is my understanding also. >>>> >>>> >>>>> This would seem to require a modest refactor of coordination to move >>>>> invocation of MR code out from any core code path. To restate what I >>>> think >>>>> is an emerging recommendation: Move cross HBase and MR coordination >> to >>> a >>>>> separate tool. This tool can ask the master to invoke procedures on >> the >>>>> HBase side that do first mile export and last mile restore. >> (Internally >>>> the >>>>> tool can also use the procedure framework for state durability, >>> perhaps, >>>>> just a thought.) Then the tool can further drive the things done with >>> MR >>>>> like shipping data off cluster or moving remote data in place and >>>> preparing >>>>> it for import. These activities do not need procedure coordination >> and >>>>> involvement of the HBase master. Only the first and last mile of the >>>>> process needs atomicity within the HBase deploy. Please let me know >> if >>> I >>>>> have misstated this. >>>>> >>>>> >>>>> Above is my understanding of our recommendation. >>>> >>>> St.Ack >>>> >>>> >>>> >>>>>> On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>> >>>>>> bq. procedure gives you a retry mechanism on failure >>>>>> >>>>>> We do need this mechanism. Take a look at the multi-step >>>>>> in FullTableBackupProcedure, etc. >>>>>> >>>>>> bq. let the user export it later when he wants >>>>>> >>>>>> This would make supporting security more complex (user A shouldn't >> be >>>>>> exporting user B's backup). And it is not user friendly - at the >> time >>>>>> backup request is issued, the following is specified: >>>>>> >>>>>> + + " BACKUP_ROOT The full root path to store the >> backup >>>>>> image,\n" >>>>>> + + " the prefix can be hdfs, webhdfs or >>>> gpfs\n" >>>>>> >>>>>> Backup root is an integral part of backup manifest. >>>>>> >>>>>> Cheers >>>>>> >>>>>> >>>>>> On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi < >>>>> theo.berto...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhih...@gmail.com> >>> wrote: >>>>>>>> >>>>>>>> Ideally the export should have one job running which does the >> retry >>>> (on >>>>>>>> failed partition) itself. >>>>>>> >>>>>>> procedure gives you a retry mechanism on failure. if you don't use >>>> that, >>>>>>> than you don't need procedure. >>>>>>> if you want you can start a procedure executor in a non master >>> process >>>>> (the >>>>>>> hbase-procedure is a separate package and does not depend on >>> master). >>>>> but >>>>>>> again, export seems a case where you don't need procedure. >>>>>>> >>>>>>> like snapshot, the logic may just be: ask the master to take a >>> backup. >>>>> and >>>>>>> let the user export it later when he wants. so you avoid having a >> MR >>>> job >>>>>>> started by the master since people does not seems to like it. >>>>>>> >>>>>>> for restore (I think that is where you use the MR splitter) you >> can >>>>>>> probably just have a backup ready (already splitted). there is >>>> already a >>>>>>> jira that should do that HBASE-14135. instead of doing the >> operation >>>> of >>>>>>> split/merge on restore. you consolidate the backup "offline" (mr >> job >>>>>>> started by the user) and then ask to restore the backup. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi < >>>>>>> theo.berto...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> as far as I understand the code, you don't need procedure for >> the >>>>>>> export >>>>>>>>> itself. >>>>>>>>> the export operation is already idempotent, since you are just >>>> copying >>>>>>>>> files. >>>>>>>>> if the file exist and is complete (check length, checksum, ...) >>> you >>>>> can >>>>>>>>> skip it, >>>>>>>>> otherwise you'll send it over again. >>>>>>>>> >>>>>>>>> you need the proc for taking the backup and restoring, >>>>>>>>> because you want to complete the operation and end up with a >>>>> consistent >>>>>>>>> state >>>>>>>>> across the multiple components you are updating (meta, fs, ...) >>>>>>>>> but again, for export you can just run the tool over and over >>> until >>>>> the >>>>>>>>> operation succeed, and that should be ok. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Matteo >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhih...@gmail.com> >>>> wrote: >>>>>>>>>> >>>>>>>>>> Master is involved in this discussion because currently only >>> Master >>>>>>>>>> instantiates ProcedureExecutor which runs the 3 Procedures for >>>>>>> backup / >>>>>>>>>> restore. >>>>>>>>>> >>>>>>>>>> What if an optional standalone service which hosts >>>> ProcedureExecutor >>>>>>> is >>>>>>>>>> used for this purpose ? >>>>>>>>>> Would that have better chance of giving us middle ground so >> that >>> we >>>>>>> can >>>>>>>>>> move this forward ? >>>>>>>>>> >>>>>>>>>> Cheers >>>>>>>>>> >>>>>>>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <st...@duboce.net> >>> wrote: >>>>>>>>>>> >>>>>>>>>>> (Moved out of the Master doing MR DISCUSSION) >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov < >>>>>>>>>>> vladrodio...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>>>> -1 on that backup be in core hbase >>>>>>>>>>>> >>>>>>>>>>>> Not sure I understand what it means. >>>>>>>>>>>> >>>>>>>>>>>> Sorry for the imprecision. >>>>>>>>>>> >>>>>>>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a >>>> dependency >>>>>>>> and >>>>>>>>>> so >>>>>>>>>>> -1 on the Master running backup/restore MR jobs, even if >>> optional. >>>>>>>>>>> >>>>>>>>>>> Master should not depend on MR. We've gone out of our way to >>> avoid >>>>>>>>> taking >>>>>>>>>>> MR on as dependency in the past. Seems late in the game for us >>> to >>>>>>>>> change >>>>>>>>>>> our opinion on this. If we didn't do it for distributed log >>>>>>>> splitting, >>>>>>>>> or >>>>>>>>>>> MOB, why would we do it to support an optional backup/restore? >>>>>>>>>>> >>>>>>>>>>> I have opinions on the questions below -- i.e. that Master >>> running >>>>>>>>>>> backup/restore is outside of the Master's charge -- but they >> are >>>>>>> not >>>>>>>>>> worth >>>>>>>>>>> much since I've not done much by way of review or contrib to >>>>>>>>>> backup/restore >>>>>>>>>>> other than to try it as a 'user' so I'll keep them to myself >>> until >>>>>>> I >>>>>>>>> do. >>>>>>>>>> I >>>>>>>>>>> only came out from under my shell to participate on the MR as >>>>>>>>> dependency >>>>>>>>>>> chat. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> M >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 1. We are not allowed to use Master to orchestrate the whole >>>>>>> process? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> We >>>>>>>>>>>> have already brought up all advantages of using >>>>>>>>>>>> Master and distributed procedures for backup and restore. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Downside of moving this to client tool is lack of fault >>>>>>> tolerance: >>>>>>>>>>>> 1.1 Client won't be allowed to do any operations, that can, >>>>>>>>>> potentially >>>>>>>>>>>> affect >>>>>>>>>>>> cluster, such as disabling splits/merges, balancer. >>>>>>>>>>>> 1.2 In case of client failure who will be doing the whole >>>>>>> rollback >>>>>>>>>>> stuff? >>>>>>>>>>>> We are trying to make it atomic. >>>>>>>>>>>> >>>>>>>>>>>> Security is not clear. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2. We are not allowed to modify code of existing HBase core >>>> classes >>>>>>>>> (what >>>>>>>>>>>> does core mean anyway)? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> 3. We are not allowed to create backup system table >>>>>>> (hbase:backup) >>>>>>>>> in a >>>>>>>>>>>> system space? Only in user space? The table is global. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we >>>>>>> have >>>>>>>>>>> touched, >>>>>>>>>>>> of course some existing HBase code. >>>>>>>>>>>> 3. is not that critical, of course we can move backup system >>> into >>>>>>>>> user >>>>>>>>>>>> space. >>>>>>>>>>>> >>>>>>>>>>>> And finally, will moving backup into external tool give us +1 >>>>>>> from >>>>>>>>>> stack? >>>>>>>>>>>> >>>>>>>>>>>> -Vlad >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <st...@duboce.net> >>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov < >>>>>>>>>>>>> vladrodio...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>>>> + MR is dead >>>>>>>>>>>>>> >>>>>>>>>>>>>> Does MR know that? :) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Again. With all due respect, stack - still no suggestions >>>>>>> what >>>>>>>>>> should >>>>>>>>>>>> we >>>>>>>>>>>>>> use for "bulk data move and transformation" instead of MR? >>>>>>>>>>>>> >>>>>>>>>>>>> Use whatever distributed engine suits your fancy -- MR, >> Spark, >>>>>>>>>>>> distributed >>>>>>>>>>>>> shell -- just don't have HBase core depend on it, even >>>>>>>> optionally. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> I suggest voting first on "do we need backup in HBase"? In >> my >>>>>>>>>>> opinion, >>>>>>>>>>>>> some >>>>>>>>>>>>>> group members still not sure about that and some will give >> -1 >>>>>>>>>>>>>> in any case. Just because ... >>>>>>>>>>>>> We could run a vote, sure. -1 on that backup be in core >> hbase >>>>>>> (+1 >>>>>>>>> on >>>>>>>>>>>> adding >>>>>>>>>>>>> all the API any such external tool might need to run). >>>>>>>>>>>>> >>>>>>>>>>>>> St.Ack >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> -Vlad >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <st...@duboce.net> >>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi < >>>>>>>>>>>>>> theo.berto...@gmail.com> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> let me try to go back to my original topic. >>>>>>>>>>>>>>>> this question was meant to be generic, and provide some >>>>>>>> rule >>>>>>>>>> for >>>>>>>>>>>>> future >>>>>>>>>>>>>>>> code. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone >>>>>>>> can >>>>>>>>>> be: >>>>>>>>>>>>>>>> - we don't want any core feature (e.g. >>>>>>>>>> compaction/log-split/log- >>>>>>>>>>>>>> reply) >>>>>>>>>>>>>>>> over MR, because some cluster may not want or may have an >>>>>>>>>>>>>>>> external/uncontrolled MR setup. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a >>>>>>>>> flag) >>>>>>>>>>> to >>>>>>>>>>>>> run >>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR >>>>>>> is >>>>>>>>> not >>>>>>>>>>>>>> required. >>>>>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind >>>>>>> a >>>>>>>>> flag >>>>>>>>>>> or >>>>>>>>>>>>> not >>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> ever being able to launch MR jobs. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it >>>>>>> from >>>>>>>>>>>>> hbase-server >>>>>>>>>>>>>>> moving it out to be an optional module (Spark would be its >>>>>>>>> peer). >>>>>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and >> Appy >>>>>>>> are >>>>>>>>>>> busy >>>>>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets >>>>>>> not >>>>>>>>>>> clutter >>>>>>>>>>>>>> task >>>>>>>>>>>>>>> harder by piling on more moving parts. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> St.Ack >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Matteo >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < >>>>>>>> yuzhih...@gmail.com >>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I suggest you look at Matteo's work for >>>>>>> AssignmentManager >>>>>>>>>> which >>>>>>>>>>>> is >>>>>>>>>>>>> to >>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>>> Master more stable. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < >>>>>>>> palomino...@gmail.com >>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> No, not your fault, at lease, not this time:) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the >>>>>>>>>> sequence >>>>>>>>>>>> of >>>>>>>>>>>>>>> calls >>>>>>>>>>>>>>>>> when >>>>>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a >>>>>>> regionserver >>>>>>>>> so >>>>>>>>>> it >>>>>>>>>>>>>> extends >>>>>>>>>>>>>>>>>> HRegionServer, and the initialization of >>>>>>> HRegionServer >>>>>>>>>>>> sometimes >>>>>>>>>>>>>>> needs >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would >>>>>>> cause >>>>>>>>>>>>>> probabilistic >>>>>>>>>>>>>>>> dead >>>>>>>>>>>>>>>>>> lock or some strange NPEs... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to >>>>>>> add >>>>>>>>> new >>>>>>>>>>>>> features >>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>> add >>>>>>>>>>>>>>>>>> external dependencies to HMaster, especially add more >>>>>>>>> works >>>>>>>>>>> for >>>>>>>>>>>>> the >>>>>>>>>>>>>>>> start >>>>>>>>>>>>>>>>>> up processing... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu < >>>>>>> yuzhih...@gmail.com >>>>>>>>> : >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I read through HADOOP-13433 >>>>>>>>>>>>>>>>>>> <https://issues.apache.org/ >>>>>>> jira/browse/HADOOP-13433> >>>>>>>> - >>>>>>>>>> the >>>>>>>>>>>>> cited >>>>>>>>>>>>>>>> race >>>>>>>>>>>>>>>>>>> condition is in jdk. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it >>>>>>>> moving. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a >>>>>>>>> problem... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is >>>>>>> it >>>>>>>> in >>>>>>>>>> the >>>>>>>>>>>>>> backup >>>>>>>>>>>>>>> / >>>>>>>>>>>>>>>>>>> restore mega patch ? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cheers >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 < >>>>>>>>>>> palomino...@gmail.com> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> If you guys have already implemented the feature >>>>>>> in >>>>>>>>> the >>>>>>>>>>> MR >>>>>>>>>>>>> way >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on >>>>>>>> it >>>>>>>>>> as I >>>>>>>>>>>> do >>>>>>>>>>>>>> not >>>>>>>>>>>>>>>> want >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>> block the development progress. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> But I strongly suggest later we need to revisit >>>>>>> the >>>>>>>>>>> design >>>>>>>>>>>>> and >>>>>>>>>>>>>>> see >>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>> can seperated the logic from HMaster as much as >>>>>>>>>> possible. >>>>>>>>>>>> HA >>>>>>>>>>>>> is >>>>>>>>>>>>>>>> not a >>>>>>>>>>>>>>>>>> big >>>>>>>>>>>>>>>>>>>> problem if you do not store any metada locally. >>>>>>> But >>>>>>>>> the >>>>>>>>>>>> ugly >>>>>>>>>>>>>> code >>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>> HMaster is readlly a problem... >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> And for security, I have a issue pending for a >>>>>>> long >>>>>>>>>> time. >>>>>>>>>>>> Can >>>>>>>>>>>>>>>> someone >>>>>>>>>>>>>>>>>>> help >>>>>>>>>>>>>>>>>>>> taking a simple look at it? This is what I mean, >>>>>>>> ugly >>>>>>>>>>>> code... >>>>>>>>>>>>>>>> logout >>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>> destroy the credentials in a subject when it is >>>>>>>> still >>>>>>>>>>> being >>>>>>>>>>>>>> used, >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the >>>>>>>>>> behivor >>>>>>>>>>>> and >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>>> way >>>>>>>>>>>>>>>>>>>> to fix it is to write another piece of ugly >>>>>>> code... >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> https://issues.apache.org/ >>>>>>> jira/browse/HADOOP-13433 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < >>>>>>>>>>>>>>>> vladrodio...@gmail.com >>>>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of >>>>>>> doing >>>>>>>>>> this >>>>>>>>>>>>>> without >>>>>>>>>>>>>>>>> using >>>>>>>>>>>>>>>>>>> MR, >>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>> can certainly consider that >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Our framework for distributed operations is >>>>>>>>> abstract >>>>>>>>>>> and >>>>>>>>>>>>>> allows >>>>>>>>>>>>>>>>>>>>> different implementations. MR is just one >>>>>>>>>>> implementation >>>>>>>>>>>> we >>>>>>>>>>>>>>>>> provide. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -Vlad >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < >>>>>>>>>>>>>>>> d...@hortonworks.com >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the >>>>>>>>> topic >>>>>>>>>>> of >>>>>>>>>>>>>>> MR-based >>>>>>>>>>>>>>>>>>>>>> compactions.. But I was thinking more about >>>>>>> the >>>>>>>>>>>>>> SpliceMachine >>>>>>>>>>>>>>>>>>> approach >>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>> managing compactions in Spark where >>>>>>> apparently >>>>>>>>> they >>>>>>>>>>>> saw a >>>>>>>>>>>>>> lot >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>> benefits. >>>>>>>>>>>>>>>>>>>>>> Apologies for giving you that sore throat >>>>>>>>> Andrew; I >>>>>>>>>>>>> really >>>>>>>>>>>>>>>> didn't >>>>>>>>>>>>>>>>>>> mean >>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>> :-) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> So on this issue, we have these on the plate: >>>>>>>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that >>>>>>>>>>>>>>>>>>>>>> 1. Run a standalone service other than master >>>>>>>>>>>>>>>>>>>>>> 2. Shell out from the master >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I don't think we have a good answer to (0), >>>>>>>> and I >>>>>>>>>>> don't >>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>> it's >>>>>>>>>>>>>>>>>>> even >>>>>>>>>>>>>>>>>>>>>> worth the effort of trying to build something >>>>>>>>> when >>>>>>>>>> MR >>>>>>>>>>>> is >>>>>>>>>>>>>>>> already >>>>>>>>>>>>>>>>>>> there, >>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>> being used by HBase already for some >>>>>>>> operations. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of >>>>>>>> issues - >>>>>>>>>> HA >>>>>>>>>>> of >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> server >>>>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>> being the least of them all. Security >>>>>>> (kerberos >>>>>>>>>>>>>>> authentication, >>>>>>>>>>>>>>>>>>> another >>>>>>>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that >>>>>>>>>> approach >>>>>>>>>>>> is >>>>>>>>>>>>>> DOA. >>>>>>>>>>>>>>>>>> Instead >>>>>>>>>>>>>>>>>>>>> let's >>>>>>>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I >>>>>>>>>> haven't >>>>>>>>>>>> seen >>>>>>>>>>>>>> any >>>>>>>>>>>>>>>>> good >>>>>>>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs >>>>>>>> if >>>>>>>>>>>> needed. >>>>>>>>>>>>>> It's >>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>> ideal; >>>>>>>>>>>>>>>>>>>>>> agreed. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Now before going to (2), let's see what are >>>>>>> the >>>>>>>>>>>> benefits >>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> running >>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think >>>>>>>> Ted >>>>>>>>>> has >>>>>>>>>>>>>>> summarized >>>>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>> issues that we need to take care of - >>>>>>>> basically, >>>>>>>>>> the >>>>>>>>>>>>> master >>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>> keep >>>>>>>>>>>>>>>>>>>>> track >>>>>>>>>>>>>>>>>>>>>> of running jobs, and should it fail, the >>>>>>> backup >>>>>>>>>>> master >>>>>>>>>>>>> can >>>>>>>>>>>>>>>>> continue >>>>>>>>>>>>>>>>>>>>> keeping >>>>>>>>>>>>>>>>>>>>>> track of it (since the jobId would have been >>>>>>>>>> recorded >>>>>>>>>>>> in >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> proc >>>>>>>>>>>>>>>>>>> WAL). >>>>>>>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed >>>>>>>>>>>> backup/restore >>>>>>>>>>>>>>>>>> processes. >>>>>>>>>>>>>>>>>>>>>> Security is another issue - the job needs to >>>>>>>> run >>>>>>>>> as >>>>>>>>>>>>> 'hbase' >>>>>>>>>>>>>>>> since >>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>> owns >>>>>>>>>>>>>>>>>>>>>> the data. Having the master launch the job >>>>>>>> makes >>>>>>>>> it >>>>>>>>>>> get >>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>> privilege. >>>>>>>>>>>>>>>>>>>>> In >>>>>>>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the >>>>>>>>> above >>>>>>>>>>>>>>> management. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is >>>>>>>>> ready >>>>>>>>>>>> from >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> overall >>>>>>>>>>>>>>>>>>>>>> design/arch point of view (maybe code review >>>>>>> is >>>>>>>>>> still >>>>>>>>>>>>>> pending >>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>>>> Matteo). >>>>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of >>>>>>> doing >>>>>>>>> this >>>>>>>>>>>>> without >>>>>>>>>>>>>>>> using >>>>>>>>>>>>>>>>>> MR, >>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't >>>>>>>> think >>>>>>>>> we >>>>>>>>>>>>> should >>>>>>>>>>>>>>>> block >>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>> patch >>>>>>>>>>>>>>>>>>>>>> from getting merged. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ________________________________________ >>>>>>>>>>>>>>>>>>>>>> From: 张铎 <palomino...@gmail.com> >>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM >>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org >>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by >>>>>>>>> Master >>>>>>>>>>> or >>>>>>>>>>>> RS >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> So what about a standalone service other than >>>>>>>>>> master? >>>>>>>>>>>> You >>>>>>>>>>>>>> can >>>>>>>>>>>>>>>> use >>>>>>>>>>>>>>>>>>> your >>>>>>>>>>>>>>>>>>>>> own >>>>>>>>>>>>>>>>>>>>>> procedure store in that service? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu < >>>>>>>>>>> yuzhih...@gmail.com >>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> An earlier implementation was client >>>>>>> driven. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> But with that approach, it is hard to >>>>>>> resume >>>>>>>> if >>>>>>>>>>> there >>>>>>>>>>>>> is >>>>>>>>>>>>>>>> error >>>>>>>>>>>>>>>>>>>> midway. >>>>>>>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup / >>>>>>> restore >>>>>>>>>> more >>>>>>>>>>>>>> robust. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Another consideration is for security. It >>>>>>> is >>>>>>>>> hard >>>>>>>>>>> to >>>>>>>>>>>>>>> enforce >>>>>>>>>>>>>>>>>>> security >>>>>>>>>>>>>>>>>>>>> (to >>>>>>>>>>>>>>>>>>>>>>> be implemented) for client driven actions. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Cheers >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew >>>>>>>> Purtell < >>>>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point, >>>>>>> which >>>>>>>>> is >>>>>>>>>>>>>> "shelling >>>>>>>>>>>>>>>> out" >>>>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why >>>>>>> not >>>>>>>>>> drive >>>>>>>>>>>>> this >>>>>>>>>>>>>>>> with a >>>>>>>>>>>>>>>>>>>> utility >>>>>>>>>>>>>>>>>>>>>>> derived from Tool? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir >>>>>>>>> Rodionov >>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>> vladrodio...@gmail.com >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a >>>>>>>> common >>>>>>>>>>> case >>>>>>>>>>>> we >>>>>>>>>>>>>>> just >>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>> HDFS >>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. >>>>>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR >>>>>>> framework >>>>>>>>>>>>> (especially >>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>>> features >>>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced >>>>>>>>>> another >>>>>>>>>>>> cost >>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>> maintain. >>>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> So , you are not backup users in this >>>>>>>> case. >>>>>>>>>> Many >>>>>>>>>>>> our >>>>>>>>>>>>>>>>> customers >>>>>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>>>> full >>>>>>>>>>>>>>>>>>>>>>>>> stack deployed and >>>>>>>>>>>>>>>>>>>>>>>>> want see backup to be a standard >>>>>>> feature. >>>>>>>>>>> Besides >>>>>>>>>>>>>> this, >>>>>>>>>>>>>>>>>> nothing >>>>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>>>>> happen >>>>>>>>>>>>>>>>>>>>>>>>> in your cluster >>>>>>>>>>>>>>>>>>>>>>>>> if you won't be doing backups. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R >>>>>>>>>>>> dependency) >>>>>>>>>>>>>> goes >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>> nowhere. >>>>>>>>>>>>>>>>>>>>>> We >>>>>>>>>>>>>>>>>>>>>>>>> asked already, at least twice, to >>>>>>> suggest >>>>>>>>>>> another >>>>>>>>>>>>>>>> framework >>>>>>>>>>>>>>>>>>> (other >>>>>>>>>>>>>>>>>>>>>> than >>>>>>>>>>>>>>>>>>>>>>> M/R) >>>>>>>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*. >>>>>>>> Still >>>>>>>>>>>> waiting >>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>> suggestions. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> -Vlad >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted >>>>>>> Yu < >>>>>>>>>>>>>>>>> yuzhih...@gmail.com >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the >>>>>>>>>> cluster, >>>>>>>>>>>>> hbase >>>>>>>>>>>>>>>> still >>>>>>>>>>>>>>>>>>>>> functions >>>>>>>>>>>>>>>>>>>>>>>>>> normally (post merge). >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we >>>>>>>> have >>>>>>>>>> long >>>>>>>>>>>>> been >>>>>>>>>>>>>>>>>> depending >>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at >>>>>>> ExportSnapshot. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Cheers >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng >>>>>>>> Chen >>>>>>>>> < >>>>>>>>>>>>>>>>>>>>> heng.chen.1...@gmail.com >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a >>>>>>>> common >>>>>>>>>>> case >>>>>>>>>>>> we >>>>>>>>>>>>>>> just >>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>> HDFS >>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. >>>>>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR >>>>>>> framework >>>>>>>>>>>>> (especially >>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>>> features >>>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced >>>>>>>>>> another >>>>>>>>>>>> cost >>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>> maintain. >>>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 < >>>>>>>>>>>>>> palomino...@gmail.com >>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice >>>>>>>>>>>>>> Backup/Restore >>>>>>>>>>>>>>>>>> feature, >>>>>>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase, >>>>>>>> then >>>>>>>>>> we >>>>>>>>>>>>> could >>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>> depend >>>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>>>> MR, >>>>>>>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager >>>>>>>>>> instance >>>>>>>>>>>>> that >>>>>>>>>>>>>>>>> submits >>>>>>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>>> jobs >>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>> do >>>>>>>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we >>>>>>>>> think >>>>>>>>>>>> this >>>>>>>>>>>>>> is a >>>>>>>>>>>>>>>>> core >>>>>>>>>>>>>>>>>>>>> feature >>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd >>>>>>>> better >>>>>>>>>>>>> implement >>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>> without >>>>>>>>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 < >>>>>>>>>>>>>> palomino...@gmail.com >>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR >>>>>>>>> jobs. >>>>>>>>>>> It >>>>>>>>>>>> is >>>>>>>>>>>>>> OK >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>> our >>>>>>>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think >>>>>>> the >>>>>>>>>> bottom >>>>>>>>>>>>> line >>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>>>>>>>>>>> launch >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by >>>>>>>>> other >>>>>>>>>>>>>> services. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew >>>>>>>>> Purtell < >>>>>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is >>>>>>> on >>>>>>>>> the >>>>>>>>>>>> line >>>>>>>>>>>>> I >>>>>>>>>>>>>>>> think, >>>>>>>>>>>>>>>>>> so >>>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>> fair >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> question. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility >>>>>>>> derived >>>>>>>>>>> from >>>>>>>>>>>>> Tool >>>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>> our >>>>>>>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>>>>>>>>> apps? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the >>>>>>>>> AccessController >>>>>>>>>>> to >>>>>>>>>>>>>> decide >>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>>>> allowed? >>>>>>>>>>>>>>>>>>>>>> But >>>>>>>>>>>>>>>>>>>>>>>>>>> nothing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the >>>>>>>> job >>>>>>>>>>>>>>>>>>>> manually/independently, >>>>>>>>>>>>>>>>>>>>>>> right? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM, >>>>>>> Matteo >>>>>>>>>>>> Bertozzi < >>>>>>>>>>>>>>>>>>>>>>>>>>> theo.berto...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not >>>>>>>> about >>>>>>>>>>> tools >>>>>>>>>>>>>> using >>>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>>> (everyone i >>>>>>>>>>>>>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ok with those). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok >>>>>>> with >>>>>>>>>>> running >>>>>>>>>>>>> MR >>>>>>>>>>>>>>> jobs >>>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>>>> Master >>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>> RSs >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the >>>>>>> first >>>>>>>>> time >>>>>>>>>>> we >>>>>>>>>>>> do >>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Matteo >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, >>>>>>>>>> Devaraj >>>>>>>>>>>> Das >>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>>>> d...@hortonworks.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like >>>>>>>>>>>>> ExportSnapshot >>>>>>>>>>>>>> / >>>>>>>>>>>>>>>>>> Backup / >>>>>>>>>>>>>>>>>>>>>>> Restore, >>>>>>>>>>>>>>>>>>>>>>>>>>> it's >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is >>>>>>>> the >>>>>>>>>>> right >>>>>>>>>>>>>>>> framework >>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>> such. >>>>>>>>>>>>>>>>>>>>>>> We >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR >>>>>>> (just >>>>>>>>>> saying >>>>>>>>>>>> :) >>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________ >>>>>>>>>> __________ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu < >>>>>>> yuzhih...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, >>>>>>> 2016 >>>>>>>>> 2:00 >>>>>>>>>>> PM >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs >>>>>>>>>> started >>>>>>>>>>>> by >>>>>>>>>>>>>>> Master >>>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>> RS >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in >>>>>>>> the >>>>>>>>>> same >>>>>>>>>>>>>>> category >>>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>>>>>> import >>>>>>>>>>>>>>>>>>>>> / >>>>>>>>>>>>>>>>>>>>>>>>>>> export. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, >>>>>>>>> Andrew >>>>>>>>>>>>>> Purtell < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around >>>>>>>> core >>>>>>>>> in >>>>>>>>>>> my >>>>>>>>>>>>>>> opinion. >>>>>>>>>>>>>>>>>> Like >>>>>>>>>>>>>>>>>>>>> import >>>>>>>>>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> export. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's >>>>>>>> fine. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, >>>>>>>> Matteo >>>>>>>>>>>>> Bertozzi >>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>>>>> mberto...@apache.org> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion >>>>>>> around >>>>>>>>>>> running >>>>>>>>>>>> MR >>>>>>>>>>>>>>> jobs >>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>>> hbase >>>>>>>>>>>>>>>>>>>>>>>>>>> (Master >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that >>>>>>> there >>>>>>>>> was >>>>>>>>>>>>>>> discussion >>>>>>>>>>>>>>>>>> about >>>>>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>>>>>> having >>>>>>>>>>>>>>>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion >>>>>>> where >>>>>>>>>> around >>>>>>>>>>>> MOB >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>> had >>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>>> job >>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compact, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a >>>>>>>>>> non-MR >>>>>>>>>>>> job >>>>>>>>>>>>> to >>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>> merged, >>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> had a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log >>>>>>>>>>> split/replay. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup >>>>>>>>> feature >>>>>>>>>>>>>>>> (HBASE-7912), >>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>> runs >>>>>>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>>>>>>>>> job >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or >>>>>>>> restore >>>>>>>>>>> data. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really >>>>>>> core" >>>>>>>>> as >>>>>>>>>>> in.. >>>>>>>>>>>>> if >>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>>>>>> use >>>>>>>>>>>>>>>>>>>>>>>>>> backup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you'll >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but >>>>>>>>> this >>>>>>>>>>> was >>>>>>>>>>>>>>> probably >>>>>>>>>>>>>>>>>> true >>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>> MOB >>>>>>>>>>>>>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't >>>>>>>> need >>>>>>>>>>> MR") >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that >>>>>>>>> says >>>>>>>>>>> "we >>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>> want >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>>>>>>>> hbase >>>>>>>>>>>>>>>>>>>>>>>>>>> run >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started >>>>>>> manually >>>>>>>> by >>>>>>>>>> the >>>>>>>>>>>>> user >>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>> do >>>>>>>>>>>>>>>>>>>> that". >>>>>>>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without >>>>>>>>>>> problems? >> >> >> >> -- >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >>