bq. don't call out to an external framework we don't own from master (or regionserver) code
So the standalone service would run out of proc - in the same vein as REST or thrift server. Cheers On Sat, Sep 24, 2016 at 10:40 AM, Andrew Purtell <andrew.purt...@gmail.com> wrote: > I was attempting to summarize Ted. > > A new maven module sounds like a good idea to me. Or we could move all the > tools that use MR out to one. Or... > > The key takeaway seems to be don't call out to an external framework we > don't own from master (or regionserver) code. > > > On Sep 24, 2016, at 10:15 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > bq. Internally the tool can also use the procedure framework for state > > durability > > > > Isn't this the standalone service I proposed this morning ? > > > > bq. Move cross HBase and MR coordination to a separate tool > > > > Where should this tool live (hbase-backup module) ? > > > > Thanks > > > > > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell < > andrew.purt...@gmail.com> > > wrote: > > > >> At branch merge voting time now more eyes are getting on the design > issues > >> with dissenting opinion emerging. This is the branch merge process > working > >> as our community has designed it. Because this is the first full project > >> review of the code and implementation I think we all have to be > flexible. I > >> see the community as trying to narrow the technical objection at issue > to > >> the smallest possible scope. It's simple: don't call out to an external > >> execution framework we don't own from core master (and by extension > >> regionserver) code. We had this objection before to a proposed external > >> compaction implementation for > >> MOB so should not come as a surprise. Please let me know if I have > >> misstated this. > >> > >> This would seem to require a modest refactor of coordination to move > >> invocation of MR code out from any core code path. To restate what I > think > >> is an emerging recommendation: Move cross HBase and MR coordination to a > >> separate tool. This tool can ask the master to invoke procedures on the > >> HBase side that do first mile export and last mile restore. (Internally > the > >> tool can also use the procedure framework for state durability, perhaps, > >> just a thought.) Then the tool can further drive the things done with MR > >> like shipping data off cluster or moving remote data in place and > preparing > >> it for import. These activities do not need procedure coordination and > >> involvement of the HBase master. Only the first and last mile of the > >> process needs atomicity within the HBase deploy. Please let me know if I > >> have misstated this. > >> > >> > >>> On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >>> > >>> bq. procedure gives you a retry mechanism on failure > >>> > >>> We do need this mechanism. Take a look at the multi-step > >>> in FullTableBackupProcedure, etc. > >>> > >>> bq. let the user export it later when he wants > >>> > >>> This would make supporting security more complex (user A shouldn't be > >>> exporting user B's backup). And it is not user friendly - at the time > >>> backup request is issued, the following is specified: > >>> > >>> + + " BACKUP_ROOT The full root path to store the backup > >>> image,\n" > >>> + + " the prefix can be hdfs, webhdfs or > gpfs\n" > >>> > >>> Backup root is an integral part of backup manifest. > >>> > >>> Cheers > >>> > >>> > >>> On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi < > >> theo.berto...@gmail.com> > >>> wrote: > >>> > >>>>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >>>>> > >>>>> Ideally the export should have one job running which does the retry > (on > >>>>> failed partition) itself. > >>>>> > >>>> > >>>> procedure gives you a retry mechanism on failure. if you don't use > that, > >>>> than you don't need procedure. > >>>> if you want you can start a procedure executor in a non master process > >> (the > >>>> hbase-procedure is a separate package and does not depend on master). > >> but > >>>> again, export seems a case where you don't need procedure. > >>>> > >>>> like snapshot, the logic may just be: ask the master to take a backup. > >> and > >>>> let the user export it later when he wants. so you avoid having a MR > job > >>>> started by the master since people does not seems to like it. > >>>> > >>>> for restore (I think that is where you use the MR splitter) you can > >>>> probably just have a backup ready (already splitted). there is > already a > >>>> jira that should do that HBASE-14135. instead of doing the operation > of > >>>> split/merge on restore. you consolidate the backup "offline" (mr job > >>>> started by the user) and then ask to restore the backup. > >>>> > >>>> > >>>>> > >>>>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi < > >>>> theo.berto...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> as far as I understand the code, you don't need procedure for the > >>>> export > >>>>>> itself. > >>>>>> the export operation is already idempotent, since you are just > copying > >>>>>> files. > >>>>>> if the file exist and is complete (check length, checksum, ...) you > >> can > >>>>>> skip it, > >>>>>> otherwise you'll send it over again. > >>>>>> > >>>>>> you need the proc for taking the backup and restoring, > >>>>>> because you want to complete the operation and end up with a > >> consistent > >>>>>> state > >>>>>> across the multiple components you are updating (meta, fs, ...) > >>>>>> but again, for export you can just run the tool over and over until > >> the > >>>>>> operation succeed, and that should be ok. > >>>>>> > >>>>>> > >>>>>> > >>>>>> Matteo > >>>>>> > >>>>>> > >>>>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhih...@gmail.com> > wrote: > >>>>>>> > >>>>>>> Master is involved in this discussion because currently only Master > >>>>>>> instantiates ProcedureExecutor which runs the 3 Procedures for > >>>> backup / > >>>>>>> restore. > >>>>>>> > >>>>>>> What if an optional standalone service which hosts > ProcedureExecutor > >>>> is > >>>>>>> used for this purpose ? > >>>>>>> Would that have better chance of giving us middle ground so that we > >>>> can > >>>>>>> move this forward ? > >>>>>>> > >>>>>>> Cheers > >>>>>>> > >>>>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <st...@duboce.net> wrote: > >>>>>>>> > >>>>>>>> (Moved out of the Master doing MR DISCUSSION) > >>>>>>>> > >>>>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov < > >>>>>>>> vladrodio...@gmail.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>>>> -1 on that backup be in core hbase > >>>>>>>>> > >>>>>>>>> Not sure I understand what it means. > >>>>>>>>> > >>>>>>>>> Sorry for the imprecision. > >>>>>>>> > >>>>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a > dependency > >>>>> and > >>>>>>> so > >>>>>>>> -1 on the Master running backup/restore MR jobs, even if optional. > >>>>>>>> > >>>>>>>> Master should not depend on MR. We've gone out of our way to avoid > >>>>>> taking > >>>>>>>> MR on as dependency in the past. Seems late in the game for us to > >>>>>> change > >>>>>>>> our opinion on this. If we didn't do it for distributed log > >>>>> splitting, > >>>>>> or > >>>>>>>> MOB, why would we do it to support an optional backup/restore? > >>>>>>>> > >>>>>>>> I have opinions on the questions below -- i.e. that Master running > >>>>>>>> backup/restore is outside of the Master's charge -- but they are > >>>> not > >>>>>>> worth > >>>>>>>> much since I've not done much by way of review or contrib to > >>>>>>> backup/restore > >>>>>>>> other than to try it as a 'user' so I'll keep them to myself until > >>>> I > >>>>>> do. > >>>>>>> I > >>>>>>>> only came out from under my shell to participate on the MR as > >>>>>> dependency > >>>>>>>> chat. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> M > >>>>>>>> > >>>>>>>> > >>>>>>>> 1. We are not allowed to use Master to orchestrate the whole > >>>> process? > >>>>>>>> > >>>>>>>> > >>>>>>>> We > >>>>>>>>> have already brought up all advantages of using > >>>>>>>>> Master and distributed procedures for backup and restore. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Downside of moving this to client tool is lack of fault > >>>> tolerance: > >>>>>>>>> 1.1 Client won't be allowed to do any operations, that can, > >>>>>>> potentially > >>>>>>>>> affect > >>>>>>>>> cluster, such as disabling splits/merges, balancer. > >>>>>>>>> 1.2 In case of client failure who will be doing the whole > >>>> rollback > >>>>>>>> stuff? > >>>>>>>>> We are trying to make it atomic. > >>>>>>>>> > >>>>>>>>> Security is not clear. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> 2. We are not allowed to modify code of existing HBase core > classes > >>>>>> (what > >>>>>>>>> does core mean anyway)? > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> 3. We are not allowed to create backup system table > >>>> (hbase:backup) > >>>>>> in a > >>>>>>>>> system space? Only in user space? The table is global. > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we > >>>> have > >>>>>>>> touched, > >>>>>>>>> of course some existing HBase code. > >>>>>>>>> 3. is not that critical, of course we can move backup system into > >>>>>> user > >>>>>>>>> space. > >>>>>>>>> > >>>>>>>>> And finally, will moving backup into external tool give us +1 > >>>> from > >>>>>>> stack? > >>>>>>>>> > >>>>>>>>> -Vlad > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <st...@duboce.net> > >>>> wrote: > >>>>>>>>> > >>>>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov < > >>>>>>>>>> vladrodio...@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>>>> + MR is dead > >>>>>>>>>>> > >>>>>>>>>>> Does MR know that? :) > >>>>>>>>>>> > >>>>>>>>>>> Again. With all due respect, stack - still no suggestions > >>>> what > >>>>>>> should > >>>>>>>>> we > >>>>>>>>>>> use for "bulk data move and transformation" instead of MR? > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Use whatever distributed engine suits your fancy -- MR, Spark, > >>>>>>>>> distributed > >>>>>>>>>> shell -- just don't have HBase core depend on it, even > >>>>> optionally. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> I suggest voting first on "do we need backup in HBase"? In my > >>>>>>>> opinion, > >>>>>>>>>> some > >>>>>>>>>>> group members still not sure about that and some will give -1 > >>>>>>>>>>> in any case. Just because ... > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> We could run a vote, sure. -1 on that backup be in core hbase > >>>> (+1 > >>>>>> on > >>>>>>>>> adding > >>>>>>>>>> all the API any such external tool might need to run). > >>>>>>>>>> > >>>>>>>>>> St.Ack > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> -Vlad > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <st...@duboce.net> > >>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi < > >>>>>>>>>>> theo.berto...@gmail.com> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> let me try to go back to my original topic. > >>>>>>>>>>>>> this question was meant to be generic, and provide some > >>>>> rule > >>>>>>> for > >>>>>>>>>> future > >>>>>>>>>>>>> code. > >>>>>>>>>>>>> > >>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone > >>>>> can > >>>>>>> be: > >>>>>>>>>>>>> - we don't want any core feature (e.g. > >>>>>>> compaction/log-split/log- > >>>>>>>>>>> reply) > >>>>>>>>>>>>> over MR, because some cluster may not want or may have an > >>>>>>>>>>>>> external/uncontrolled MR setup. > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> +1 > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a > >>>>>> flag) > >>>>>>>> to > >>>>>>>>>> run > >>>>>>>>>>> MR > >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR > >>>> is > >>>>>> not > >>>>>>>>>>> required. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind > >>>> a > >>>>>> flag > >>>>>>>> or > >>>>>>>>>> not > >>>>>>>>>>> -- > >>>>>>>>>>>> ever being able to launch MR jobs. > >>>>>>>>>>>> > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it > >>>> from > >>>>>>>>>> hbase-server > >>>>>>>>>>>> moving it out to be an optional module (Spark would be its > >>>>>> peer). > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy > >>>>> are > >>>>>>>> busy > >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets > >>>> not > >>>>>>>> clutter > >>>>>>>>>>> task > >>>>>>>>>>>> harder by piling on more moving parts. > >>>>>>>>>>>> > >>>>>>>>>>>> St.Ack > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> Matteo > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < > >>>>> yuzhih...@gmail.com > >>>>>>> > >>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> I suggest you look at Matteo's work for > >>>> AssignmentManager > >>>>>>> which > >>>>>>>>> is > >>>>>>>>>> to > >>>>>>>>>>>>> make > >>>>>>>>>>>>>> Master more stable. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cheers > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < > >>>>> palomino...@gmail.com > >>>>>>> > >>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the > >>>>>>> sequence > >>>>>>>>> of > >>>>>>>>>>>> calls > >>>>>>>>>>>>>> when > >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a > >>>> regionserver > >>>>>> so > >>>>>>> it > >>>>>>>>>>> extends > >>>>>>>>>>>>>>> HRegionServer, and the initialization of > >>>> HRegionServer > >>>>>>>>> sometimes > >>>>>>>>>>>> needs > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would > >>>> cause > >>>>>>>>>>> probabilistic > >>>>>>>>>>>>> dead > >>>>>>>>>>>>>>> lock or some strange NPEs... > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to > >>>> add > >>>>>> new > >>>>>>>>>> features > >>>>>>>>>>>> or > >>>>>>>>>>>>>> add > >>>>>>>>>>>>>>> external dependencies to HMaster, especially add more > >>>>>> works > >>>>>>>> for > >>>>>>>>>> the > >>>>>>>>>>>>> start > >>>>>>>>>>>>>>> up processing... > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu < > >>>> yuzhih...@gmail.com > >>>>>> : > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I read through HADOOP-13433 > >>>>>>>>>>>>>>>> <https://issues.apache.org/ > >>>> jira/browse/HADOOP-13433> > >>>>> - > >>>>>>> the > >>>>>>>>>> cited > >>>>>>>>>>>>> race > >>>>>>>>>>>>>>>> condition is in jdk. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it > >>>>> moving. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a > >>>>>> problem... > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is > >>>> it > >>>>> in > >>>>>>> the > >>>>>>>>>>> backup > >>>>>>>>>>>> / > >>>>>>>>>>>>>>>> restore mega patch ? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Cheers > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 < > >>>>>>>> palomino...@gmail.com> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> If you guys have already implemented the feature > >>>> in > >>>>>> the > >>>>>>>> MR > >>>>>>>>>> way > >>>>>>>>>>>> and > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on > >>>>> it > >>>>>>> as I > >>>>>>>>> do > >>>>>>>>>>> not > >>>>>>>>>>>>> want > >>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> block the development progress. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> But I strongly suggest later we need to revisit > >>>> the > >>>>>>>> design > >>>>>>>>>> and > >>>>>>>>>>>> see > >>>>>>>>>>>>> if > >>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>> can seperated the logic from HMaster as much as > >>>>>>> possible. > >>>>>>>>> HA > >>>>>>>>>> is > >>>>>>>>>>>>> not a > >>>>>>>>>>>>>>> big > >>>>>>>>>>>>>>>>> problem if you do not store any metada locally. > >>>> But > >>>>>> the > >>>>>>>>> ugly > >>>>>>>>>>> code > >>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>> HMaster is readlly a problem... > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> And for security, I have a issue pending for a > >>>> long > >>>>>>> time. > >>>>>>>>> Can > >>>>>>>>>>>>> someone > >>>>>>>>>>>>>>>> help > >>>>>>>>>>>>>>>>> taking a simple look at it? This is what I mean, > >>>>> ugly > >>>>>>>>> code... > >>>>>>>>>>>>> logout > >>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> destroy the credentials in a subject when it is > >>>>> still > >>>>>>>> being > >>>>>>>>>>> used, > >>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the > >>>>>>> behivor > >>>>>>>>> and > >>>>>>>>>>> the > >>>>>>>>>>>>> only > >>>>>>>>>>>>>>> way > >>>>>>>>>>>>>>>>> to fix it is to write another piece of ugly > >>>> code... > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> https://issues.apache.org/ > >>>> jira/browse/HADOOP-13433 > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < > >>>>>>>>>>>>> vladrodio...@gmail.com > >>>>>>>>>>>>>>> : > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of > >>>> doing > >>>>>>> this > >>>>>>>>>>> without > >>>>>>>>>>>>>> using > >>>>>>>>>>>>>>>> MR, > >>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>> can certainly consider that > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Our framework for distributed operations is > >>>>>> abstract > >>>>>>>> and > >>>>>>>>>>> allows > >>>>>>>>>>>>>>>>>> different implementations. MR is just one > >>>>>>>> implementation > >>>>>>>>> we > >>>>>>>>>>>>>> provide. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> -Vlad > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > >>>>>>>>>>>>> d...@hortonworks.com > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the > >>>>>> topic > >>>>>>>> of > >>>>>>>>>>>> MR-based > >>>>>>>>>>>>>>>>>>> compactions.. But I was thinking more about > >>>> the > >>>>>>>>>>> SpliceMachine > >>>>>>>>>>>>>>>> approach > >>>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>>> managing compactions in Spark where > >>>> apparently > >>>>>> they > >>>>>>>>> saw a > >>>>>>>>>>> lot > >>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>> benefits. > >>>>>>>>>>>>>>>>>>> Apologies for giving you that sore throat > >>>>>> Andrew; I > >>>>>>>>>> really > >>>>>>>>>>>>> didn't > >>>>>>>>>>>>>>>> mean > >>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>> :-) > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> So on this issue, we have these on the plate: > >>>>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that > >>>>>>>>>>>>>>>>>>> 1. Run a standalone service other than master > >>>>>>>>>>>>>>>>>>> 2. Shell out from the master > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I don't think we have a good answer to (0), > >>>>> and I > >>>>>>>> don't > >>>>>>>>>>> think > >>>>>>>>>>>>>> it's > >>>>>>>>>>>>>>>> even > >>>>>>>>>>>>>>>>>>> worth the effort of trying to build something > >>>>>> when > >>>>>>> MR > >>>>>>>>> is > >>>>>>>>>>>>> already > >>>>>>>>>>>>>>>> there, > >>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>> being used by HBase already for some > >>>>> operations. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of > >>>>> issues - > >>>>>>> HA > >>>>>>>> of > >>>>>>>>>> the > >>>>>>>>>>>>>> server > >>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>> being the least of them all. Security > >>>> (kerberos > >>>>>>>>>>>> authentication, > >>>>>>>>>>>>>>>> another > >>>>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that > >>>>>>> approach > >>>>>>>>> is > >>>>>>>>>>> DOA. > >>>>>>>>>>>>>>> Instead > >>>>>>>>>>>>>>>>>> let's > >>>>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I > >>>>>>> haven't > >>>>>>>>> seen > >>>>>>>>>>> any > >>>>>>>>>>>>>> good > >>>>>>>>>>>>>>>>> reason > >>>>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs > >>>>> if > >>>>>>>>> needed. > >>>>>>>>>>> It's > >>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>> ideal; > >>>>>>>>>>>>>>>>>>> agreed. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Now before going to (2), let's see what are > >>>> the > >>>>>>>>> benefits > >>>>>>>>>> of > >>>>>>>>>>>>>> running > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think > >>>>> Ted > >>>>>>> has > >>>>>>>>>>>> summarized > >>>>>>>>>>>>>>> some > >>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>> issues that we need to take care of - > >>>>> basically, > >>>>>>> the > >>>>>>>>>> master > >>>>>>>>>>>> can > >>>>>>>>>>>>>>> keep > >>>>>>>>>>>>>>>>>> track > >>>>>>>>>>>>>>>>>>> of running jobs, and should it fail, the > >>>> backup > >>>>>>>> master > >>>>>>>>>> can > >>>>>>>>>>>>>> continue > >>>>>>>>>>>>>>>>>> keeping > >>>>>>>>>>>>>>>>>>> track of it (since the jobId would have been > >>>>>>> recorded > >>>>>>>>> in > >>>>>>>>>>> the > >>>>>>>>>>>>> proc > >>>>>>>>>>>>>>>> WAL). > >>>>>>>>>>>>>>>>>> The > >>>>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed > >>>>>>>>> backup/restore > >>>>>>>>>>>>>>> processes. > >>>>>>>>>>>>>>>>>>> Security is another issue - the job needs to > >>>>> run > >>>>>> as > >>>>>>>>>> 'hbase' > >>>>>>>>>>>>> since > >>>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>> owns > >>>>>>>>>>>>>>>>>>> the data. Having the master launch the job > >>>>> makes > >>>>>> it > >>>>>>>> get > >>>>>>>>>>> that > >>>>>>>>>>>>>>>> privilege. > >>>>>>>>>>>>>>>>>> In > >>>>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the > >>>>>> above > >>>>>>>>>>>> management. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is > >>>>>> ready > >>>>>>>>> from > >>>>>>>>>>> the > >>>>>>>>>>>>>>> overall > >>>>>>>>>>>>>>>>>>> design/arch point of view (maybe code review > >>>> is > >>>>>>> still > >>>>>>>>>>> pending > >>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>> Matteo). > >>>>>>>>>>>>>>>>>>> If in the future, we find better ways of > >>>> doing > >>>>>> this > >>>>>>>>>> without > >>>>>>>>>>>>> using > >>>>>>>>>>>>>>> MR, > >>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't > >>>>> think > >>>>>> we > >>>>>>>>>> should > >>>>>>>>>>>>> block > >>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>> patch > >>>>>>>>>>>>>>>>>>> from getting merged. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> ________________________________________ > >>>>>>>>>>>>>>>>>>> From: 张铎 <palomino...@gmail.com> > >>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM > >>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org > >>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by > >>>>>> Master > >>>>>>>> or > >>>>>>>>> RS > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> So what about a standalone service other than > >>>>>>> master? > >>>>>>>>> You > >>>>>>>>>>> can > >>>>>>>>>>>>> use > >>>>>>>>>>>>>>>> your > >>>>>>>>>>>>>>>>>> own > >>>>>>>>>>>>>>>>>>> procedure store in that service? > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu < > >>>>>>>> yuzhih...@gmail.com > >>>>>>>>>> : > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> An earlier implementation was client > >>>> driven. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> But with that approach, it is hard to > >>>> resume > >>>>> if > >>>>>>>> there > >>>>>>>>>> is > >>>>>>>>>>>>> error > >>>>>>>>>>>>>>>>> midway. > >>>>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup / > >>>> restore > >>>>>>> more > >>>>>>>>>>> robust. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Another consideration is for security. It > >>>> is > >>>>>> hard > >>>>>>>> to > >>>>>>>>>>>> enforce > >>>>>>>>>>>>>>>> security > >>>>>>>>>>>>>>>>>> (to > >>>>>>>>>>>>>>>>>>>> be implemented) for client driven actions. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Cheers > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew > >>>>> Purtell < > >>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com> > >>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point, > >>>> which > >>>>>> is > >>>>>>>>>>> "shelling > >>>>>>>>>>>>> out" > >>>>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why > >>>> not > >>>>>>> drive > >>>>>>>>>> this > >>>>>>>>>>>>> with a > >>>>>>>>>>>>>>>>> utility > >>>>>>>>>>>>>>>>>>>> derived from Tool? > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir > >>>>>> Rodionov > >>>>>>> < > >>>>>>>>>>>>>>>>>> vladrodio...@gmail.com > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a > >>>>> common > >>>>>>>> case > >>>>>>>>> we > >>>>>>>>>>>> just > >>>>>>>>>>>>>> have > >>>>>>>>>>>>>>>>> HDFS > >>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR > >>>> framework > >>>>>>>>>> (especially > >>>>>>>>>>>> some > >>>>>>>>>>>>>>>>> features > >>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced > >>>>>>> another > >>>>>>>>> cost > >>>>>>>>>>> for > >>>>>>>>>>>>>>>> maintain. > >>>>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> So , you are not backup users in this > >>>>> case. > >>>>>>> Many > >>>>>>>>> our > >>>>>>>>>>>>>> customers > >>>>>>>>>>>>>>>>> have > >>>>>>>>>>>>>>>>>>> full > >>>>>>>>>>>>>>>>>>>>>> stack deployed and > >>>>>>>>>>>>>>>>>>>>>> want see backup to be a standard > >>>> feature. > >>>>>>>> Besides > >>>>>>>>>>> this, > >>>>>>>>>>>>>>> nothing > >>>>>>>>>>>>>>>>> will > >>>>>>>>>>>>>>>>>>>> happen > >>>>>>>>>>>>>>>>>>>>>> in your cluster > >>>>>>>>>>>>>>>>>>>>>> if you won't be doing backups. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R > >>>>>>>>> dependency) > >>>>>>>>>>> goes > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> nowhere. > >>>>>>>>>>>>>>>>>>> We > >>>>>>>>>>>>>>>>>>>>>> asked already, at least twice, to > >>>> suggest > >>>>>>>> another > >>>>>>>>>>>>> framework > >>>>>>>>>>>>>>>> (other > >>>>>>>>>>>>>>>>>>> than > >>>>>>>>>>>>>>>>>>>> M/R) > >>>>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*. > >>>>> Still > >>>>>>>>> waiting > >>>>>>>>>>> for > >>>>>>>>>>>>>>>>> suggestions. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> -Vlad > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted > >>>> Yu < > >>>>>>>>>>>>>> yuzhih...@gmail.com > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the > >>>>>>> cluster, > >>>>>>>>>> hbase > >>>>>>>>>>>>> still > >>>>>>>>>>>>>>>>>> functions > >>>>>>>>>>>>>>>>>>>>>>> normally (post merge). > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we > >>>>> have > >>>>>>> long > >>>>>>>>>> been > >>>>>>>>>>>>>>> depending > >>>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at > >>>> ExportSnapshot. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Cheers > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng > >>>>> Chen > >>>>>> < > >>>>>>>>>>>>>>>>>> heng.chen.1...@gmail.com > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a > >>>>> common > >>>>>>>> case > >>>>>>>>> we > >>>>>>>>>>>> just > >>>>>>>>>>>>>> have > >>>>>>>>>>>>>>>>> HDFS > >>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR > >>>> framework > >>>>>>>>>> (especially > >>>>>>>>>>>> some > >>>>>>>>>>>>>>>>> features > >>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced > >>>>>>> another > >>>>>>>>> cost > >>>>>>>>>>> for > >>>>>>>>>>>>>>>> maintain. > >>>>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 < > >>>>>>>>>>> palomino...@gmail.com > >>>>>>>>>>>>> : > >>>>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice > >>>>>>>>>>> Backup/Restore > >>>>>>>>>>>>>>> feature, > >>>>>>>>>>>>>>>>> if > >>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>> think > >>>>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase, > >>>>> then > >>>>>>> we > >>>>>>>>>> could > >>>>>>>>>>>> make > >>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>> depend > >>>>>>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>>>>>>>>> MR, > >>>>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager > >>>>>>> instance > >>>>>>>>>> that > >>>>>>>>>>>>>> submits > >>>>>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>> jobs > >>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>> do > >>>>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we > >>>>>> think > >>>>>>>>> this > >>>>>>>>>>> is a > >>>>>>>>>>>>>> core > >>>>>>>>>>>>>>>>>> feature > >>>>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd > >>>>> better > >>>>>>>>>> implement > >>>>>>>>>>> it > >>>>>>>>>>>>>>> without > >>>>>>>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 < > >>>>>>>>>>> palomino...@gmail.com > >>>>>>>>>>>>> : > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR > >>>>>> jobs. > >>>>>>>> It > >>>>>>>>> is > >>>>>>>>>>> OK > >>>>>>>>>>>>> that > >>>>>>>>>>>>>>>> some > >>>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>>> our > >>>>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think > >>>> the > >>>>>>> bottom > >>>>>>>>>> line > >>>>>>>>>>> is > >>>>>>>>>>>>>> that > >>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>> should > >>>>>>>>>>>>>>>>>>>>>>>> launch > >>>>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by > >>>>>> other > >>>>>>>>>>> services. > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew > >>>>>> Purtell < > >>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com > >>>>>>>>>>>>>>>>>>>>> : > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is > >>>> on > >>>>>> the > >>>>>>>>> line > >>>>>>>>>> I > >>>>>>>>>>>>> think, > >>>>>>>>>>>>>>> so > >>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>> fair > >>>>>>>>>>>>>>>>>>>>>>>>>>> question. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility > >>>>> derived > >>>>>>>> from > >>>>>>>>>> Tool > >>>>>>>>>>>>> like > >>>>>>>>>>>>>>> our > >>>>>>>>>>>>>>>>>> other > >>>>>>>>>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>>>>>>>> apps? > >>>>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the > >>>>>> AccessController > >>>>>>>> to > >>>>>>>>>>> decide > >>>>>>>>>>>>> if > >>>>>>>>>>>>>>>>> allowed? > >>>>>>>>>>>>>>>>>>> But > >>>>>>>>>>>>>>>>>>>>>>>> nothing > >>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the > >>>>> job > >>>>>>>>>>>>>>>>> manually/independently, > >>>>>>>>>>>>>>>>>>>> right? > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM, > >>>> Matteo > >>>>>>>>> Bertozzi < > >>>>>>>>>>>>>>>>>>>>>>>> theo.berto...@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not > >>>>> about > >>>>>>>> tools > >>>>>>>>>>> using > >>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>> (everyone i > >>>>>>>>>>>>>>>>>>>>>>>> think > >>>>>>>>>>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>>>>>>>> ok with those). > >>>>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok > >>>> with > >>>>>>>> running > >>>>>>>>>> MR > >>>>>>>>>>>> jobs > >>>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>> Master > >>>>>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>> RSs > >>>>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the > >>>> first > >>>>>> time > >>>>>>>> we > >>>>>>>>> do > >>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Matteo > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, > >>>>>>> Devaraj > >>>>>>>>> Das > >>>>>>>>>> < > >>>>>>>>>>>>>>>>>>>>>>> d...@hortonworks.com> > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like > >>>>>>>>>> ExportSnapshot > >>>>>>>>>>> / > >>>>>>>>>>>>>>> Backup / > >>>>>>>>>>>>>>>>>>>> Restore, > >>>>>>>>>>>>>>>>>>>>>>>> it's > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is > >>>>> the > >>>>>>>> right > >>>>>>>>>>>>> framework > >>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>> such. > >>>>>>>>>>>>>>>>>>>> We > >>>>>>>>>>>>>>>>>>>>>>>>>>> should > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR > >>>> (just > >>>>>>> saying > >>>>>>>>> :) > >>>>>>>>>> ) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________ > >>>>>>> __________ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu < > >>>> yuzhih...@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, > >>>> 2016 > >>>>>> 2:00 > >>>>>>>> PM > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs > >>>>>>> started > >>>>>>>>> by > >>>>>>>>>>>> Master > >>>>>>>>>>>>>> or > >>>>>>>>>>>>>>> RS > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in > >>>>> the > >>>>>>> same > >>>>>>>>>>>> category > >>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>> import > >>>>>>>>>>>>>>>>>> / > >>>>>>>>>>>>>>>>>>>>>>>> export. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, > >>>>>> Andrew > >>>>>>>>>>> Purtell < > >>>>>>>>>>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around > >>>>> core > >>>>>> in > >>>>>>>> my > >>>>>>>>>>>> opinion. > >>>>>>>>>>>>>>> Like > >>>>>>>>>>>>>>>>>> import > >>>>>>>>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>>>>>>>>>>>>> export. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's > >>>>> fine. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, > >>>>> Matteo > >>>>>>>>>> Bertozzi > >>>>>>>>>>> < > >>>>>>>>>>>>>>>>>>>>>>>> mberto...@apache.org> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion > >>>> around > >>>>>>>> running > >>>>>>>>> MR > >>>>>>>>>>>> jobs > >>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>> hbase > >>>>>>>>>>>>>>>>>>>>>>>> (Master > >>>>>>>>>>>>>>>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that > >>>> there > >>>>>> was > >>>>>>>>>>>> discussion > >>>>>>>>>>>>>>> about > >>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>> having > >>>>>>>>>>>>>>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> has > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion > >>>> where > >>>>>>> around > >>>>>>>>> MOB > >>>>>>>>>>>> that > >>>>>>>>>>>>>> had > >>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>> job > >>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compact, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a > >>>>>>> non-MR > >>>>>>>>> job > >>>>>>>>>> to > >>>>>>>>>>>> be > >>>>>>>>>>>>>>>> merged, > >>>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>>> think > >>>>>>>>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> had a > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log > >>>>>>>> split/replay. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup > >>>>>> feature > >>>>>>>>>>>>> (HBASE-7912), > >>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>> runs > >>>>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>>>>>>>> job > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or > >>>>> restore > >>>>>>>> data. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really > >>>> core" > >>>>>> as > >>>>>>>> in.. > >>>>>>>>>> if > >>>>>>>>>>>> you > >>>>>>>>>>>>>>> don't > >>>>>>>>>>>>>>>>> use > >>>>>>>>>>>>>>>>>>>>>>> backup > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you'll > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but > >>>>>> this > >>>>>>>> was > >>>>>>>>>>>> probably > >>>>>>>>>>>>>>> true > >>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>> MOB > >>>>>>>>>>>>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> "if > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't > >>>>> need > >>>>>>>> MR") > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that > >>>>>> says > >>>>>>>> "we > >>>>>>>>>>> don't > >>>>>>>>>>>>> want > >>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> have > >>>>>>>>>>>>>>>>>>>>>>> hbase > >>>>>>>>>>>>>>>>>>>>>>>> run > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started > >>>> manually > >>>>> by > >>>>>>> the > >>>>>>>>>> user > >>>>>>>>>>>> can > >>>>>>>>>>>>> do > >>>>>>>>>>>>>>>>> that". > >>>>>>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without > >>>>>>>> problems? > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >> >