How would security be enforce on client side ? On Sat, Sep 24, 2016 at 12:21 PM, Vladimir Rodionov <vladrodio...@gmail.com> wrote:
> >> The standalone service so far > > 1, 2, 3 can be done in client side as well. Are you going to implement HA > for the service? If not, service can fail and will require clean up/repair > on restart. The same can be done with a client - side tool (in repair mode) > > -1 for the separate service. KISS rules. If community want us to remove > MR/Backup from the core we will move it into separate sub-project and > implement this as a client - driven tool set. > > -Vlad > > On Sat, Sep 24, 2016 at 12:11 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > The standalone service so far seems to be middle ground having the > > following advantages: > > > > 1. utilization of existing proc V2 framework for fault tolerance > > 2. friendliness to security support to be implemented in the next phase - > > security is hard to enforce from client side > > 3. not introducing MR calls in master or region servers > > > > Cheers > > > > > > On Sat, Sep 24, 2016 at 11:26 AM, Vladimir Rodionov < > > vladrodio...@gmail.com> > > wrote: > > > > > >> So the standalone service would run out of proc - in the same vein > as > > > REST > > > or thrift server. > > > > > > Ted, running separate process/service to coordinate backups is not a > good > > > idea. We have already a lot of them. > > > > > > On Sat, Sep 24, 2016 at 11:20 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > > bq. don't call out to an external framework we don't own from master > > (or > > > > regionserver) code > > > > > > > > So the standalone service would run out of proc - in the same vein as > > > REST > > > > or thrift server. > > > > > > > > Cheers > > > > > > > > On Sat, Sep 24, 2016 at 10:40 AM, Andrew Purtell < > > > andrew.purt...@gmail.com > > > > > > > > > wrote: > > > > > > > > > I was attempting to summarize Ted. > > > > > > > > > > A new maven module sounds like a good idea to me. Or we could move > > all > > > > the > > > > > tools that use MR out to one. Or... > > > > > > > > > > The key takeaway seems to be don't call out to an external > framework > > we > > > > > don't own from master (or regionserver) code. > > > > > > > > > > > On Sep 24, 2016, at 10:15 AM, Ted Yu <yuzhih...@gmail.com> > wrote: > > > > > > > > > > > > bq. Internally the tool can also use the procedure framework for > > > state > > > > > > durability > > > > > > > > > > > > Isn't this the standalone service I proposed this morning ? > > > > > > > > > > > > bq. Move cross HBase and MR coordination to a separate tool > > > > > > > > > > > > Where should this tool live (hbase-backup module) ? > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell < > > > > > andrew.purt...@gmail.com> > > > > > > wrote: > > > > > > > > > > > >> At branch merge voting time now more eyes are getting on the > > design > > > > > issues > > > > > >> with dissenting opinion emerging. This is the branch merge > process > > > > > working > > > > > >> as our community has designed it. Because this is the first full > > > > project > > > > > >> review of the code and implementation I think we all have to be > > > > > flexible. I > > > > > >> see the community as trying to narrow the technical objection at > > > issue > > > > > to > > > > > >> the smallest possible scope. It's simple: don't call out to an > > > > external > > > > > >> execution framework we don't own from core master (and by > > extension > > > > > >> regionserver) code. We had this objection before to a proposed > > > > external > > > > > >> compaction implementation for > > > > > >> MOB so should not come as a surprise. Please let me know if I > have > > > > > >> misstated this. > > > > > >> > > > > > >> This would seem to require a modest refactor of coordination to > > move > > > > > >> invocation of MR code out from any core code path. To restate > > what I > > > > > think > > > > > >> is an emerging recommendation: Move cross HBase and MR > > coordination > > > > to a > > > > > >> separate tool. This tool can ask the master to invoke procedures > > on > > > > the > > > > > >> HBase side that do first mile export and last mile restore. > > > > (Internally > > > > > the > > > > > >> tool can also use the procedure framework for state durability, > > > > perhaps, > > > > > >> just a thought.) Then the tool can further drive the things done > > > with > > > > MR > > > > > >> like shipping data off cluster or moving remote data in place > and > > > > > preparing > > > > > >> it for import. These activities do not need procedure > coordination > > > and > > > > > >> involvement of the HBase master. Only the first and last mile of > > the > > > > > >> process needs atomicity within the HBase deploy. Please let me > > know > > > > if I > > > > > >> have misstated this. > > > > > >> > > > > > >> > > > > > >>> On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhih...@gmail.com> > wrote: > > > > > >>> > > > > > >>> bq. procedure gives you a retry mechanism on failure > > > > > >>> > > > > > >>> We do need this mechanism. Take a look at the multi-step > > > > > >>> in FullTableBackupProcedure, etc. > > > > > >>> > > > > > >>> bq. let the user export it later when he wants > > > > > >>> > > > > > >>> This would make supporting security more complex (user A > > shouldn't > > > be > > > > > >>> exporting user B's backup). And it is not user friendly - at > the > > > time > > > > > >>> backup request is issued, the following is specified: > > > > > >>> > > > > > >>> + + " BACKUP_ROOT The full root path to store the > > > backup > > > > > >>> image,\n" > > > > > >>> + + " the prefix can be hdfs, webhdfs > or > > > > > gpfs\n" > > > > > >>> > > > > > >>> Backup root is an integral part of backup manifest. > > > > > >>> > > > > > >>> Cheers > > > > > >>> > > > > > >>> > > > > > >>> On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi < > > > > > >> theo.berto...@gmail.com> > > > > > >>> wrote: > > > > > >>> > > > > > >>>>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhih...@gmail.com > > > > > > wrote: > > > > > >>>>> > > > > > >>>>> Ideally the export should have one job running which does the > > > retry > > > > > (on > > > > > >>>>> failed partition) itself. > > > > > >>>>> > > > > > >>>> > > > > > >>>> procedure gives you a retry mechanism on failure. if you don't > > use > > > > > that, > > > > > >>>> than you don't need procedure. > > > > > >>>> if you want you can start a procedure executor in a non master > > > > process > > > > > >> (the > > > > > >>>> hbase-procedure is a separate package and does not depend on > > > > master). > > > > > >> but > > > > > >>>> again, export seems a case where you don't need procedure. > > > > > >>>> > > > > > >>>> like snapshot, the logic may just be: ask the master to take a > > > > backup. > > > > > >> and > > > > > >>>> let the user export it later when he wants. so you avoid > having > > a > > > MR > > > > > job > > > > > >>>> started by the master since people does not seems to like it. > > > > > >>>> > > > > > >>>> for restore (I think that is where you use the MR splitter) > you > > > can > > > > > >>>> probably just have a backup ready (already splitted). there is > > > > > already a > > > > > >>>> jira that should do that HBASE-14135. instead of doing the > > > operation > > > > > of > > > > > >>>> split/merge on restore. you consolidate the backup "offline" > (mr > > > job > > > > > >>>> started by the user) and then ask to restore the backup. > > > > > >>>> > > > > > >>>> > > > > > >>>>> > > > > > >>>>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi < > > > > > >>>> theo.berto...@gmail.com> > > > > > >>>>> wrote: > > > > > >>>>> > > > > > >>>>>> as far as I understand the code, you don't need procedure > for > > > the > > > > > >>>> export > > > > > >>>>>> itself. > > > > > >>>>>> the export operation is already idempotent, since you are > just > > > > > copying > > > > > >>>>>> files. > > > > > >>>>>> if the file exist and is complete (check length, checksum, > > ...) > > > > you > > > > > >> can > > > > > >>>>>> skip it, > > > > > >>>>>> otherwise you'll send it over again. > > > > > >>>>>> > > > > > >>>>>> you need the proc for taking the backup and restoring, > > > > > >>>>>> because you want to complete the operation and end up with a > > > > > >> consistent > > > > > >>>>>> state > > > > > >>>>>> across the multiple components you are updating (meta, fs, > > ...) > > > > > >>>>>> but again, for export you can just run the tool over and > over > > > > until > > > > > >> the > > > > > >>>>>> operation succeed, and that should be ok. > > > > > >>>>>> > > > > > >>>>>> > > > > > >>>>>> > > > > > >>>>>> Matteo > > > > > >>>>>> > > > > > >>>>>> > > > > > >>>>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu < > yuzhih...@gmail.com > > > > > > > > wrote: > > > > > >>>>>>> > > > > > >>>>>>> Master is involved in this discussion because currently > only > > > > Master > > > > > >>>>>>> instantiates ProcedureExecutor which runs the 3 Procedures > > for > > > > > >>>> backup / > > > > > >>>>>>> restore. > > > > > >>>>>>> > > > > > >>>>>>> What if an optional standalone service which hosts > > > > > ProcedureExecutor > > > > > >>>> is > > > > > >>>>>>> used for this purpose ? > > > > > >>>>>>> Would that have better chance of giving us middle ground so > > > that > > > > we > > > > > >>>> can > > > > > >>>>>>> move this forward ? > > > > > >>>>>>> > > > > > >>>>>>> Cheers > > > > > >>>>>>> > > > > > >>>>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <st...@duboce.net> > > > > wrote: > > > > > >>>>>>>> > > > > > >>>>>>>> (Moved out of the Master doing MR DISCUSSION) > > > > > >>>>>>>> > > > > > >>>>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov < > > > > > >>>>>>>> vladrodio...@gmail.com> > > > > > >>>>>>>> wrote: > > > > > >>>>>>>> > > > > > >>>>>>>>>>> -1 on that backup be in core hbase > > > > > >>>>>>>>> > > > > > >>>>>>>>> Not sure I understand what it means. > > > > > >>>>>>>>> > > > > > >>>>>>>>> Sorry for the imprecision. > > > > > >>>>>>>> > > > > > >>>>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a > > > > > dependency > > > > > >>>>> and > > > > > >>>>>>> so > > > > > >>>>>>>> -1 on the Master running backup/restore MR jobs, even if > > > > optional. > > > > > >>>>>>>> > > > > > >>>>>>>> Master should not depend on MR. We've gone out of our way > to > > > > avoid > > > > > >>>>>> taking > > > > > >>>>>>>> MR on as dependency in the past. Seems late in the game > for > > us > > > > to > > > > > >>>>>> change > > > > > >>>>>>>> our opinion on this. If we didn't do it for distributed > log > > > > > >>>>> splitting, > > > > > >>>>>> or > > > > > >>>>>>>> MOB, why would we do it to support an optional > > backup/restore? > > > > > >>>>>>>> > > > > > >>>>>>>> I have opinions on the questions below -- i.e. that Master > > > > running > > > > > >>>>>>>> backup/restore is outside of the Master's charge -- but > they > > > are > > > > > >>>> not > > > > > >>>>>>> worth > > > > > >>>>>>>> much since I've not done much by way of review or contrib > to > > > > > >>>>>>> backup/restore > > > > > >>>>>>>> other than to try it as a 'user' so I'll keep them to > myself > > > > until > > > > > >>>> I > > > > > >>>>>> do. > > > > > >>>>>>> I > > > > > >>>>>>>> only came out from under my shell to participate on the MR > > as > > > > > >>>>>> dependency > > > > > >>>>>>>> chat. > > > > > >>>>>>>> > > > > > >>>>>>>> Thanks, > > > > > >>>>>>>> M > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> 1. We are not allowed to use Master to orchestrate the > whole > > > > > >>>> process? > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> We > > > > > >>>>>>>>> have already brought up all advantages of using > > > > > >>>>>>>>> Master and distributed procedures for backup and > restore. > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> Downside of moving this to client tool is lack of fault > > > > > >>>> tolerance: > > > > > >>>>>>>>> 1.1 Client won't be allowed to do any operations, that > can, > > > > > >>>>>>> potentially > > > > > >>>>>>>>> affect > > > > > >>>>>>>>> cluster, such as disabling splits/merges, balancer. > > > > > >>>>>>>>> 1.2 In case of client failure who will be doing the whole > > > > > >>>> rollback > > > > > >>>>>>>> stuff? > > > > > >>>>>>>>> We are trying to make it atomic. > > > > > >>>>>>>>> > > > > > >>>>>>>>> Security is not clear. > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> 2. We are not allowed to modify code of existing HBase > core > > > > > classes > > > > > >>>>>> (what > > > > > >>>>>>>>> does core mean anyway)? > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>>> 3. We are not allowed to create backup system table > > > > > >>>> (hbase:backup) > > > > > >>>>>> in a > > > > > >>>>>>>>> system space? Only in user space? The table is global. > > > > > >>>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>>> 2. is critical. Despite the fact, that 95% of code is > new, > > we > > > > > >>>> have > > > > > >>>>>>>> touched, > > > > > >>>>>>>>> of course some existing HBase code. > > > > > >>>>>>>>> 3. is not that critical, of course we can move backup > > system > > > > into > > > > > >>>>>> user > > > > > >>>>>>>>> space. > > > > > >>>>>>>>> > > > > > >>>>>>>>> And finally, will moving backup into external tool give > us > > +1 > > > > > >>>> from > > > > > >>>>>>> stack? > > > > > >>>>>>>>> > > > > > >>>>>>>>> -Vlad > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack < > st...@duboce.net> > > > > > >>>> wrote: > > > > > >>>>>>>>> > > > > > >>>>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov < > > > > > >>>>>>>>>> vladrodio...@gmail.com> > > > > > >>>>>>>>>> wrote: > > > > > >>>>>>>>>> > > > > > >>>>>>>>>>>>> + MR is dead > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> Does MR know that? :) > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> Again. With all due respect, stack - still no > suggestions > > > > > >>>> what > > > > > >>>>>>> should > > > > > >>>>>>>>> we > > > > > >>>>>>>>>>> use for "bulk data move and transformation" instead of > > MR? > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> Use whatever distributed engine suits your fancy -- MR, > > > Spark, > > > > > >>>>>>>>> distributed > > > > > >>>>>>>>>> shell -- just don't have HBase core depend on it, even > > > > > >>>>> optionally. > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> > > > > > >>>>>>>>>>> I suggest voting first on "do we need backup in HBase"? > > In > > > my > > > > > >>>>>>>> opinion, > > > > > >>>>>>>>>> some > > > > > >>>>>>>>>>> group members still not sure about that and some will > > give > > > -1 > > > > > >>>>>>>>>>> in any case. Just because ... > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>> We could run a vote, sure. -1 on that backup be in core > > > hbase > > > > > >>>> (+1 > > > > > >>>>>> on > > > > > >>>>>>>>> adding > > > > > >>>>>>>>>> all the API any such external tool might need to run). > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> St.Ack > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> > > > > > >>>>>>>>>>> -Vlad > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack < > > st...@duboce.net> > > > > > >>>>>> wrote: > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi < > > > > > >>>>>>>>>>> theo.berto...@gmail.com> > > > > > >>>>>>>>>>>> wrote: > > > > > >>>>>>>>>>>> > > > > > >>>>>>>>>>>>> let me try to go back to my original topic. > > > > > >>>>>>>>>>>>> this question was meant to be generic, and provide > some > > > > > >>>>> rule > > > > > >>>>>>> for > > > > > >>>>>>>>>> future > > > > > >>>>>>>>>>>>> code. > > > > > >>>>>>>>>>>>> > > > > > >>>>>>>>>>>>> from what I can gather, a rule that may satisfy > > everyone > > > > > >>>>> can > > > > > >>>>>>> be: > > > > > >>>>>>>>>>>>> - we don't want any core feature (e.g. > > > > > >>>>>>> compaction/log-split/log- > > > > > >>>>>>>>>>> reply) > > > > > >>>>>>>>>>>>> over MR, because some cluster may not want or may > have > > an > > > > > >>>>>>>>>>>>> external/uncontrolled MR setup. > > > > > >>>>>>>>>>>>> > > > > > >>>>>>>>>>>> > > > > > >>>>>>>>>>>> +1 > > > > > >>>>>>>>>>>> > > > > > >>>>>>>>>>>> > > > > > >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled > by > > a > > > > > >>>>>> flag) > > > > > >>>>>>>> to > > > > > >>>>>>>>>> run > > > > > >>>>>>>>>>> MR > > > > > >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, > MR > > > > > >>>> is > > > > > >>>>>> not > > > > > >>>>>>>>>>> required. > > > > > >>>>>>>>>>>>> > > > > > >>>>>>>>>>>>> > > > > > >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether > > behind > > > > > >>>> a > > > > > >>>>>> flag > > > > > >>>>>>>> or > > > > > >>>>>>>>>> not > > > > > >>>>>>>>>>> -- > > > > > >>>>>>>>>>>> ever being able to launch MR jobs. > > > > > >>>>>>>>>>>> > > > > > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo > it > > > > > >>>> from > > > > > >>>>>>>>>> hbase-server > > > > > >>>>>>>>>>>> moving it out to be an optional module (Spark would be > > its > > > > > >>>>>> peer). > > > > > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and > > > Appy > > > > > >>>>> are > > > > > >>>>>>>> busy > > > > > >>>>>>>>>>>> working hard on moving it up on to a new foundation. > > Lets > > > > > >>>> not > > > > > >>>>>>>> clutter > > > > > >>>>>>>>>>> task > > > > > >>>>>>>>>>>> harder by piling on more moving parts. > > > > > >>>>>>>>>>>> > > > > > >>>>>>>>>>>> St.Ack > > > > > >>>>>>>>>>>> > > > > > >>>>>>>>>>>> > > > > > >>>>>>>>>>>>> Matteo > > > > > >>>>>>>>>>>>> > > > > > >>>>>>>>>>>>> > > > > > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < > > > > > >>>>> yuzhih...@gmail.com > > > > > >>>>>>> > > > > > >>>>>>>>> wrote: > > > > > >>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>> I suggest you look at Matteo's work for > > > > > >>>> AssignmentManager > > > > > >>>>>>> which > > > > > >>>>>>>>> is > > > > > >>>>>>>>>> to > > > > > >>>>>>>>>>>>> make > > > > > >>>>>>>>>>>>>> Master more stable. > > > > > >>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>> Cheers > > > > > >>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < > > > > > >>>>> palomino...@gmail.com > > > > > >>>>>>> > > > > > >>>>>>>>> wrote: > > > > > >>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:) > > > > > >>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me > the > > > > > >>>>>>> sequence > > > > > >>>>>>>>> of > > > > > >>>>>>>>>>>> calls > > > > > >>>>>>>>>>>>>> when > > > > > >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a > > > > > >>>> regionserver > > > > > >>>>>> so > > > > > >>>>>>> it > > > > > >>>>>>>>>>> extends > > > > > >>>>>>>>>>>>>>> HRegionServer, and the initialization of > > > > > >>>> HRegionServer > > > > > >>>>>>>>> sometimes > > > > > >>>>>>>>>>>> needs > > > > > >>>>>>>>>>>>> to > > > > > >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would > > > > > >>>> cause > > > > > >>>>>>>>>>> probabilistic > > > > > >>>>>>>>>>>>> dead > > > > > >>>>>>>>>>>>>>> lock or some strange NPEs... > > > > > >>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to > > > > > >>>> add > > > > > >>>>>> new > > > > > >>>>>>>>>> features > > > > > >>>>>>>>>>>> or > > > > > >>>>>>>>>>>>>> add > > > > > >>>>>>>>>>>>>>> external dependencies to HMaster, especially add > more > > > > > >>>>>> works > > > > > >>>>>>>> for > > > > > >>>>>>>>>> the > > > > > >>>>>>>>>>>>> start > > > > > >>>>>>>>>>>>>>> up processing... > > > > > >>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>> Thanks. > > > > > >>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu < > > > > > >>>> yuzhih...@gmail.com > > > > > >>>>>> : > > > > > >>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>> I read through HADOOP-13433 > > > > > >>>>>>>>>>>>>>>> <https://issues.apache.org/ > > > > > >>>> jira/browse/HADOOP-13433> > > > > > >>>>> - > > > > > >>>>>>> the > > > > > >>>>>>>>>> cited > > > > > >>>>>>>>>>>>> race > > > > > >>>>>>>>>>>>>>>> condition is in jdk. > > > > > >>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it > > > > > >>>>> moving. > > > > > >>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a > > > > > >>>>>> problem... > > > > > >>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is > > > > > >>>> it > > > > > >>>>> in > > > > > >>>>>>> the > > > > > >>>>>>>>>>> backup > > > > > >>>>>>>>>>>> / > > > > > >>>>>>>>>>>>>>>> restore mega patch ? > > > > > >>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>> Cheers > > > > > >>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 < > > > > > >>>>>>>> palomino...@gmail.com> > > > > > >>>>>>>>>>>> wrote: > > > > > >>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>> If you guys have already implemented the feature > > > > > >>>> in > > > > > >>>>>> the > > > > > >>>>>>>> MR > > > > > >>>>>>>>>> way > > > > > >>>>>>>>>>>> and > > > > > >>>>>>>>>>>>>> the > > > > > >>>>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on > > > > > >>>>> it > > > > > >>>>>>> as I > > > > > >>>>>>>>> do > > > > > >>>>>>>>>>> not > > > > > >>>>>>>>>>>>> want > > > > > >>>>>>>>>>>>>>> to > > > > > >>>>>>>>>>>>>>>>> block the development progress. > > > > > >>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>> But I strongly suggest later we need to revisit > > > > > >>>> the > > > > > >>>>>>>> design > > > > > >>>>>>>>>> and > > > > > >>>>>>>>>>>> see > > > > > >>>>>>>>>>>>> if > > > > > >>>>>>>>>>>>>>> we > > > > > >>>>>>>>>>>>>>>>> can seperated the logic from HMaster as much as > > > > > >>>>>>> possible. > > > > > >>>>>>>>> HA > > > > > >>>>>>>>>> is > > > > > >>>>>>>>>>>>> not a > > > > > >>>>>>>>>>>>>>> big > > > > > >>>>>>>>>>>>>>>>> problem if you do not store any metada locally. > > > > > >>>> But > > > > > >>>>>> the > > > > > >>>>>>>>> ugly > > > > > >>>>>>>>>>> code > > > > > >>>>>>>>>>>>> in > > > > > >>>>>>>>>>>>>>>>> HMaster is readlly a problem... > > > > > >>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>> And for security, I have a issue pending for a > > > > > >>>> long > > > > > >>>>>>> time. > > > > > >>>>>>>>> Can > > > > > >>>>>>>>>>>>> someone > > > > > >>>>>>>>>>>>>>>> help > > > > > >>>>>>>>>>>>>>>>> taking a simple look at it? This is what I mean, > > > > > >>>>> ugly > > > > > >>>>>>>>> code... > > > > > >>>>>>>>>>>>> logout > > > > > >>>>>>>>>>>>>>> and > > > > > >>>>>>>>>>>>>>>>> destroy the credentials in a subject when it is > > > > > >>>>> still > > > > > >>>>>>>> being > > > > > >>>>>>>>>>> used, > > > > > >>>>>>>>>>>>> and > > > > > >>>>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the > > > > > >>>>>>> behivor > > > > > >>>>>>>>> and > > > > > >>>>>>>>>>> the > > > > > >>>>>>>>>>>>> only > > > > > >>>>>>>>>>>>>>> way > > > > > >>>>>>>>>>>>>>>>> to fix it is to write another piece of ugly > > > > > >>>> code... > > > > > >>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>> https://issues.apache.org/ > > > > > >>>> jira/browse/HADOOP-13433 > > > > > >>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < > > > > > >>>>>>>>>>>>> vladrodio...@gmail.com > > > > > >>>>>>>>>>>>>>> : > > > > > >>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of > > > > > >>>> doing > > > > > >>>>>>> this > > > > > >>>>>>>>>>> without > > > > > >>>>>>>>>>>>>> using > > > > > >>>>>>>>>>>>>>>> MR, > > > > > >>>>>>>>>>>>>>>>> we > > > > > >>>>>>>>>>>>>>>>>> can certainly consider that > > > > > >>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>> Our framework for distributed operations is > > > > > >>>>>> abstract > > > > > >>>>>>>> and > > > > > >>>>>>>>>>> allows > > > > > >>>>>>>>>>>>>>>>>> different implementations. MR is just one > > > > > >>>>>>>> implementation > > > > > >>>>>>>>> we > > > > > >>>>>>>>>>>>>> provide. > > > > > >>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>> -Vlad > > > > > >>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > > > > > >>>>>>>>>>>>> d...@hortonworks.com > > > > > >>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>> wrote: > > > > > >>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the > > > > > >>>>>> topic > > > > > >>>>>>>> of > > > > > >>>>>>>>>>>> MR-based > > > > > >>>>>>>>>>>>>>>>>>> compactions.. But I was thinking more about > > > > > >>>> the > > > > > >>>>>>>>>>> SpliceMachine > > > > > >>>>>>>>>>>>>>>> approach > > > > > >>>>>>>>>>>>>>>>> of > > > > > >>>>>>>>>>>>>>>>>>> managing compactions in Spark where > > > > > >>>> apparently > > > > > >>>>>> they > > > > > >>>>>>>>> saw a > > > > > >>>>>>>>>>> lot > > > > > >>>>>>>>>>>>> of > > > > > >>>>>>>>>>>>>>>>>> benefits. > > > > > >>>>>>>>>>>>>>>>>>> Apologies for giving you that sore throat > > > > > >>>>>> Andrew; I > > > > > >>>>>>>>>> really > > > > > >>>>>>>>>>>>> didn't > > > > > >>>>>>>>>>>>>>>> mean > > > > > >>>>>>>>>>>>>>>>> to > > > > > >>>>>>>>>>>>>>>>>>> :-) > > > > > >>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>> So on this issue, we have these on the plate: > > > > > >>>>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that > > > > > >>>>>>>>>>>>>>>>>>> 1. Run a standalone service other than master > > > > > >>>>>>>>>>>>>>>>>>> 2. Shell out from the master > > > > > >>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>> I don't think we have a good answer to (0), > > > > > >>>>> and I > > > > > >>>>>>>> don't > > > > > >>>>>>>>>>> think > > > > > >>>>>>>>>>>>>> it's > > > > > >>>>>>>>>>>>>>>> even > > > > > >>>>>>>>>>>>>>>>>>> worth the effort of trying to build something > > > > > >>>>>> when > > > > > >>>>>>> MR > > > > > >>>>>>>>> is > > > > > >>>>>>>>>>>>> already > > > > > >>>>>>>>>>>>>>>> there, > > > > > >>>>>>>>>>>>>>>>>> and > > > > > >>>>>>>>>>>>>>>>>>> being used by HBase already for some > > > > > >>>>> operations. > > > > > >>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of > > > > > >>>>> issues - > > > > > >>>>>>> HA > > > > > >>>>>>>> of > > > > > >>>>>>>>>> the > > > > > >>>>>>>>>>>>>> server > > > > > >>>>>>>>>>>>>>>> not > > > > > >>>>>>>>>>>>>>>>>>> being the least of them all. Security > > > > > >>>> (kerberos > > > > > >>>>>>>>>>>> authentication, > > > > > >>>>>>>>>>>>>>>> another > > > > > >>>>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that > > > > > >>>>>>> approach > > > > > >>>>>>>>> is > > > > > >>>>>>>>>>> DOA. > > > > > >>>>>>>>>>>>>>> Instead > > > > > >>>>>>>>>>>>>>>>>> let's > > > > > >>>>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I > > > > > >>>>>>> haven't > > > > > >>>>>>>>> seen > > > > > >>>>>>>>>>> any > > > > > >>>>>>>>>>>>>> good > > > > > >>>>>>>>>>>>>>>>> reason > > > > > >>>>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs > > > > > >>>>> if > > > > > >>>>>>>>> needed. > > > > > >>>>>>>>>>> It's > > > > > >>>>>>>>>>>>> not > > > > > >>>>>>>>>>>>>>>>> ideal; > > > > > >>>>>>>>>>>>>>>>>>> agreed. > > > > > >>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>> Now before going to (2), let's see what are > > > > > >>>> the > > > > > >>>>>>>>> benefits > > > > > >>>>>>>>>> of > > > > > >>>>>>>>>>>>>> running > > > > > >>>>>>>>>>>>>>>> the > > > > > >>>>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think > > > > > >>>>> Ted > > > > > >>>>>>> has > > > > > >>>>>>>>>>>> summarized > > > > > >>>>>>>>>>>>>>> some > > > > > >>>>>>>>>>>>>>>> of > > > > > >>>>>>>>>>>>>>>>>> the > > > > > >>>>>>>>>>>>>>>>>>> issues that we need to take care of - > > > > > >>>>> basically, > > > > > >>>>>>> the > > > > > >>>>>>>>>> master > > > > > >>>>>>>>>>>> can > > > > > >>>>>>>>>>>>>>> keep > > > > > >>>>>>>>>>>>>>>>>> track > > > > > >>>>>>>>>>>>>>>>>>> of running jobs, and should it fail, the > > > > > >>>> backup > > > > > >>>>>>>> master > > > > > >>>>>>>>>> can > > > > > >>>>>>>>>>>>>> continue > > > > > >>>>>>>>>>>>>>>>>> keeping > > > > > >>>>>>>>>>>>>>>>>>> track of it (since the jobId would have been > > > > > >>>>>>> recorded > > > > > >>>>>>>>> in > > > > > >>>>>>>>>>> the > > > > > >>>>>>>>>>>>> proc > > > > > >>>>>>>>>>>>>>>> WAL). > > > > > >>>>>>>>>>>>>>>>>> The > > > > > >>>>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed > > > > > >>>>>>>>> backup/restore > > > > > >>>>>>>>>>>>>>> processes. > > > > > >>>>>>>>>>>>>>>>>>> Security is another issue - the job needs to > > > > > >>>>> run > > > > > >>>>>> as > > > > > >>>>>>>>>> 'hbase' > > > > > >>>>>>>>>>>>> since > > > > > >>>>>>>>>>>>>>> it > > > > > >>>>>>>>>>>>>>>>> owns > > > > > >>>>>>>>>>>>>>>>>>> the data. Having the master launch the job > > > > > >>>>> makes > > > > > >>>>>> it > > > > > >>>>>>>> get > > > > > >>>>>>>>>>> that > > > > > >>>>>>>>>>>>>>>> privilege. > > > > > >>>>>>>>>>>>>>>>>> In > > > > > >>>>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the > > > > > >>>>>> above > > > > > >>>>>>>>>>>> management. > > > > > >>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is > > > > > >>>>>> ready > > > > > >>>>>>>>> from > > > > > >>>>>>>>>>> the > > > > > >>>>>>>>>>>>>>> overall > > > > > >>>>>>>>>>>>>>>>>>> design/arch point of view (maybe code review > > > > > >>>> is > > > > > >>>>>>> still > > > > > >>>>>>>>>>> pending > > > > > >>>>>>>>>>>>>> from > > > > > >>>>>>>>>>>>>>>>>> Matteo). > > > > > >>>>>>>>>>>>>>>>>>> If in the future, we find better ways of > > > > > >>>> doing > > > > > >>>>>> this > > > > > >>>>>>>>>> without > > > > > >>>>>>>>>>>>> using > > > > > >>>>>>>>>>>>>>> MR, > > > > > >>>>>>>>>>>>>>>>> we > > > > > >>>>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't > > > > > >>>>> think > > > > > >>>>>> we > > > > > >>>>>>>>>> should > > > > > >>>>>>>>>>>>> block > > > > > >>>>>>>>>>>>>>> this > > > > > >>>>>>>>>>>>>>>>>> patch > > > > > >>>>>>>>>>>>>>>>>>> from getting merged. > > > > > >>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>> ________________________________________ > > > > > >>>>>>>>>>>>>>>>>>> From: 张铎 <palomino...@gmail.com> > > > > > >>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM > > > > > >>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org > > > > > >>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by > > > > > >>>>>> Master > > > > > >>>>>>>> or > > > > > >>>>>>>>> RS > > > > > >>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>> So what about a standalone service other than > > > > > >>>>>>> master? > > > > > >>>>>>>>> You > > > > > >>>>>>>>>>> can > > > > > >>>>>>>>>>>>> use > > > > > >>>>>>>>>>>>>>>> your > > > > > >>>>>>>>>>>>>>>>>> own > > > > > >>>>>>>>>>>>>>>>>>> procedure store in that service? > > > > > >>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu < > > > > > >>>>>>>> yuzhih...@gmail.com > > > > > >>>>>>>>>> : > > > > > >>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>> An earlier implementation was client > > > > > >>>> driven. > > > > > >>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>> But with that approach, it is hard to > > > > > >>>> resume > > > > > >>>>> if > > > > > >>>>>>>> there > > > > > >>>>>>>>>> is > > > > > >>>>>>>>>>>>> error > > > > > >>>>>>>>>>>>>>>>> midway. > > > > > >>>>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup / > > > > > >>>> restore > > > > > >>>>>>> more > > > > > >>>>>>>>>>> robust. > > > > > >>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>> Another consideration is for security. It > > > > > >>>> is > > > > > >>>>>> hard > > > > > >>>>>>>> to > > > > > >>>>>>>>>>>> enforce > > > > > >>>>>>>>>>>>>>>> security > > > > > >>>>>>>>>>>>>>>>>> (to > > > > > >>>>>>>>>>>>>>>>>>>> be implemented) for client driven actions. > > > > > >>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>> Cheers > > > > > >>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew > > > > > >>>>> Purtell < > > > > > >>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com> > > > > > >>>>>>>>>>>>>>>>>>>> wrote: > > > > > >>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point, > > > > > >>>> which > > > > > >>>>>> is > > > > > >>>>>>>>>>> "shelling > > > > > >>>>>>>>>>>>> out" > > > > > >>>>>>>>>>>>>>>> from > > > > > >>>>>>>>>>>>>>>>>> the > > > > > >>>>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why > > > > > >>>> not > > > > > >>>>>>> drive > > > > > >>>>>>>>>> this > > > > > >>>>>>>>>>>>> with a > > > > > >>>>>>>>>>>>>>>>> utility > > > > > >>>>>>>>>>>>>>>>>>>> derived from Tool? > > > > > >>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir > > > > > >>>>>> Rodionov > > > > > >>>>>>> < > > > > > >>>>>>>>>>>>>>>>>> vladrodio...@gmail.com > > > > > >>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>> wrote: > > > > > >>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a > > > > > >>>>> common > > > > > >>>>>>>> case > > > > > >>>>>>>>> we > > > > > >>>>>>>>>>>> just > > > > > >>>>>>>>>>>>>> have > > > > > >>>>>>>>>>>>>>>>> HDFS > > > > > >>>>>>>>>>>>>>>>>>> and > > > > > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. > > > > > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR > > > > > >>>> framework > > > > > >>>>>>>>>> (especially > > > > > >>>>>>>>>>>> some > > > > > >>>>>>>>>>>>>>>>> features > > > > > >>>>>>>>>>>>>>>>>> we > > > > > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced > > > > > >>>>>>> another > > > > > >>>>>>>>> cost > > > > > >>>>>>>>>>> for > > > > > >>>>>>>>>>>>>>>> maintain. > > > > > >>>>>>>>>>>>>>>>>> I > > > > > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. > > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>> So , you are not backup users in this > > > > > >>>>> case. > > > > > >>>>>>> Many > > > > > >>>>>>>>> our > > > > > >>>>>>>>>>>>>> customers > > > > > >>>>>>>>>>>>>>>>> have > > > > > >>>>>>>>>>>>>>>>>>> full > > > > > >>>>>>>>>>>>>>>>>>>>>> stack deployed and > > > > > >>>>>>>>>>>>>>>>>>>>>> want see backup to be a standard > > > > > >>>> feature. > > > > > >>>>>>>> Besides > > > > > >>>>>>>>>>> this, > > > > > >>>>>>>>>>>>>>> nothing > > > > > >>>>>>>>>>>>>>>>> will > > > > > >>>>>>>>>>>>>>>>>>>> happen > > > > > >>>>>>>>>>>>>>>>>>>>>> in your cluster > > > > > >>>>>>>>>>>>>>>>>>>>>> if you won't be doing backups. > > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R > > > > > >>>>>>>>> dependency) > > > > > >>>>>>>>>>> goes > > > > > >>>>>>>>>>>>> to > > > > > >>>>>>>>>>>>>>>>> nowhere. > > > > > >>>>>>>>>>>>>>>>>>> We > > > > > >>>>>>>>>>>>>>>>>>>>>> asked already, at least twice, to > > > > > >>>> suggest > > > > > >>>>>>>> another > > > > > >>>>>>>>>>>>> framework > > > > > >>>>>>>>>>>>>>>> (other > > > > > >>>>>>>>>>>>>>>>>>> than > > > > > >>>>>>>>>>>>>>>>>>>> M/R) > > > > > >>>>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*. > > > > > >>>>> Still > > > > > >>>>>>>>> waiting > > > > > >>>>>>>>>>> for > > > > > >>>>>>>>>>>>>>>>> suggestions. > > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>> -Vlad > > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted > > > > > >>>> Yu < > > > > > >>>>>>>>>>>>>> yuzhih...@gmail.com > > > > > >>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>> wrote: > > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the > > > > > >>>>>>> cluster, > > > > > >>>>>>>>>> hbase > > > > > >>>>>>>>>>>>> still > > > > > >>>>>>>>>>>>>>>>>> functions > > > > > >>>>>>>>>>>>>>>>>>>>>>> normally (post merge). > > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we > > > > > >>>>> have > > > > > >>>>>>> long > > > > > >>>>>>>>>> been > > > > > >>>>>>>>>>>>>>> depending > > > > > >>>>>>>>>>>>>>>> on > > > > > >>>>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at > > > > > >>>> ExportSnapshot. > > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Cheers > > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng > > > > > >>>>> Chen > > > > > >>>>>> < > > > > > >>>>>>>>>>>>>>>>>> heng.chen.1...@gmail.com > > > > > >>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> wrote: > > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a > > > > > >>>>> common > > > > > >>>>>>>> case > > > > > >>>>>>>>> we > > > > > >>>>>>>>>>>> just > > > > > >>>>>>>>>>>>>> have > > > > > >>>>>>>>>>>>>>>>> HDFS > > > > > >>>>>>>>>>>>>>>>>>> and > > > > > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. > > > > > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR > > > > > >>>> framework > > > > > >>>>>>>>>> (especially > > > > > >>>>>>>>>>>> some > > > > > >>>>>>>>>>>>>>>>> features > > > > > >>>>>>>>>>>>>>>>>> we > > > > > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced > > > > > >>>>>>> another > > > > > >>>>>>>>> cost > > > > > >>>>>>>>>>> for > > > > > >>>>>>>>>>>>>>>> maintain. > > > > > >>>>>>>>>>>>>>>>>> I > > > > > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. > > > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 < > > > > > >>>>>>>>>>> palomino...@gmail.com > > > > > >>>>>>>>>>>>> : > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice > > > > > >>>>>>>>>>> Backup/Restore > > > > > >>>>>>>>>>>>>>> feature, > > > > > >>>>>>>>>>>>>>>>> if > > > > > >>>>>>>>>>>>>>>>>> we > > > > > >>>>>>>>>>>>>>>>>>>>>>> think > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase, > > > > > >>>>> then > > > > > >>>>>>> we > > > > > >>>>>>>>>> could > > > > > >>>>>>>>>>>> make > > > > > >>>>>>>>>>>>>> it > > > > > >>>>>>>>>>>>>>>>> depend > > > > > >>>>>>>>>>>>>>>>>>> on > > > > > >>>>>>>>>>>>>>>>>>>>>>> MR, > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager > > > > > >>>>>>> instance > > > > > >>>>>>>>>> that > > > > > >>>>>>>>>>>>>> submits > > > > > >>>>>>>>>>>>>>> MR > > > > > >>>>>>>>>>>>>>>>>> jobs > > > > > >>>>>>>>>>>>>>>>>>> to > > > > > >>>>>>>>>>>>>>>>>>>>>>> do > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we > > > > > >>>>>> think > > > > > >>>>>>>>> this > > > > > >>>>>>>>>>> is a > > > > > >>>>>>>>>>>>>> core > > > > > >>>>>>>>>>>>>>>>>> feature > > > > > >>>>>>>>>>>>>>>>>>>> that > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd > > > > > >>>>> better > > > > > >>>>>>>>>> implement > > > > > >>>>>>>>>>> it > > > > > >>>>>>>>>>>>>>> without > > > > > >>>>>>>>>>>>>>>>> MR > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS. > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks. > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 < > > > > > >>>>>>>>>>> palomino...@gmail.com > > > > > >>>>>>>>>>>>> : > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR > > > > > >>>>>> jobs. > > > > > >>>>>>>> It > > > > > >>>>>>>>> is > > > > > >>>>>>>>>>> OK > > > > > >>>>>>>>>>>>> that > > > > > >>>>>>>>>>>>>>>> some > > > > > >>>>>>>>>>>>>>>>> of > > > > > >>>>>>>>>>>>>>>>>>> our > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think > > > > > >>>> the > > > > > >>>>>>> bottom > > > > > >>>>>>>>>> line > > > > > >>>>>>>>>>> is > > > > > >>>>>>>>>>>>>> that > > > > > >>>>>>>>>>>>>>> we > > > > > >>>>>>>>>>>>>>>>>>> should > > > > > >>>>>>>>>>>>>>>>>>>>>>>> launch > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by > > > > > >>>>>> other > > > > > >>>>>>>>>>> services. > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew > > > > > >>>>>> Purtell < > > > > > >>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com > > > > > >>>>>>>>>>>>>>>>>>>>> : > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is > > > > > >>>> on > > > > > >>>>>> the > > > > > >>>>>>>>> line > > > > > >>>>>>>>>> I > > > > > >>>>>>>>>>>>> think, > > > > > >>>>>>>>>>>>>>> so > > > > > >>>>>>>>>>>>>>>> a > > > > > >>>>>>>>>>>>>>>>>> fair > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> question. > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility > > > > > >>>>> derived > > > > > >>>>>>>> from > > > > > >>>>>>>>>> Tool > > > > > >>>>>>>>>>>>> like > > > > > >>>>>>>>>>>>>>> our > > > > > >>>>>>>>>>>>>>>>>> other > > > > > >>>>>>>>>>>>>>>>>>> MR > > > > > >>>>>>>>>>>>>>>>>>>>>>>> apps? > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the > > > > > >>>>>> AccessController > > > > > >>>>>>>> to > > > > > >>>>>>>>>>> decide > > > > > >>>>>>>>>>>>> if > > > > > >>>>>>>>>>>>>>>>> allowed? > > > > > >>>>>>>>>>>>>>>>>>> But > > > > > >>>>>>>>>>>>>>>>>>>>>>>> nothing > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the > > > > > >>>>> job > > > > > >>>>>>>>>>>>>>>>> manually/independently, > > > > > >>>>>>>>>>>>>>>>>>>> right? > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM, > > > > > >>>> Matteo > > > > > >>>>>>>>> Bertozzi < > > > > > >>>>>>>>>>>>>>>>>>>>>>>> theo.berto...@gmail.com> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not > > > > > >>>>> about > > > > > >>>>>>>> tools > > > > > >>>>>>>>>>> using > > > > > >>>>>>>>>>>> MR > > > > > >>>>>>>>>>>>>>>>>> (everyone i > > > > > >>>>>>>>>>>>>>>>>>>>>>>> think > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> is > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> ok with those). > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok > > > > > >>>> with > > > > > >>>>>>>> running > > > > > >>>>>>>>>> MR > > > > > >>>>>>>>>>>> jobs > > > > > >>>>>>>>>>>>>>> from > > > > > >>>>>>>>>>>>>>>>>> Master > > > > > >>>>>>>>>>>>>>>>>>>>>>> and > > > > > >>>>>>>>>>>>>>>>>>>>>>>> RSs > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the > > > > > >>>> first > > > > > >>>>>> time > > > > > >>>>>>>> we > > > > > >>>>>>>>> do > > > > > >>>>>>>>>>>> this > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Matteo > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, > > > > > >>>>>>> Devaraj > > > > > >>>>>>>>> Das > > > > > >>>>>>>>>> < > > > > > >>>>>>>>>>>>>>>>>>>>>>> d...@hortonworks.com> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like > > > > > >>>>>>>>>> ExportSnapshot > > > > > >>>>>>>>>>> / > > > > > >>>>>>>>>>>>>>> Backup / > > > > > >>>>>>>>>>>>>>>>>>>> Restore, > > > > > >>>>>>>>>>>>>>>>>>>>>>>> it's > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is > > > > > >>>>> the > > > > > >>>>>>>> right > > > > > >>>>>>>>>>>>> framework > > > > > >>>>>>>>>>>>>>> for > > > > > >>>>>>>>>>>>>>>>>> such. > > > > > >>>>>>>>>>>>>>>>>>>> We > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> should > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR > > > > > >>>> (just > > > > > >>>>>>> saying > > > > > >>>>>>>>> :) > > > > > >>>>>>>>>> ) > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________ > > > > > >>>>>>> __________ > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu < > > > > > >>>> yuzhih...@gmail.com> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, > > > > > >>>> 2016 > > > > > >>>>>> 2:00 > > > > > >>>>>>>> PM > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs > > > > > >>>>>>> started > > > > > >>>>>>>>> by > > > > > >>>>>>>>>>>> Master > > > > > >>>>>>>>>>>>>> or > > > > > >>>>>>>>>>>>>>> RS > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in > > > > > >>>>> the > > > > > >>>>>>> same > > > > > >>>>>>>>>>>> category > > > > > >>>>>>>>>>>>> as > > > > > >>>>>>>>>>>>>>>>> import > > > > > >>>>>>>>>>>>>>>>>> / > > > > > >>>>>>>>>>>>>>>>>>>>>>>> export. > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, > > > > > >>>>>> Andrew > > > > > >>>>>>>>>>> Purtell < > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around > > > > > >>>>> core > > > > > >>>>>> in > > > > > >>>>>>>> my > > > > > >>>>>>>>>>>> opinion. > > > > > >>>>>>>>>>>>>>> Like > > > > > >>>>>>>>>>>>>>>>>> import > > > > > >>>>>>>>>>>>>>>>>>>> or > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> export. > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's > > > > > >>>>> fine. > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, > > > > > >>>>> Matteo > > > > > >>>>>>>>>> Bertozzi > > > > > >>>>>>>>>>> < > > > > > >>>>>>>>>>>>>>>>>>>>>>>> mberto...@apache.org> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion > > > > > >>>> around > > > > > >>>>>>>> running > > > > > >>>>>>>>> MR > > > > > >>>>>>>>>>>> jobs > > > > > >>>>>>>>>>>>>> from > > > > > >>>>>>>>>>>>>>>>> hbase > > > > > >>>>>>>>>>>>>>>>>>>>>>>> (Master > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> or > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)? > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that > > > > > >>>> there > > > > > >>>>>> was > > > > > >>>>>>>>>>>> discussion > > > > > >>>>>>>>>>>>>>> about > > > > > >>>>>>>>>>>>>>>>> not > > > > > >>>>>>>>>>>>>>>>>>>>>>> having > > > > > >>>>>>>>>>>>>>>>>>>>>>>> MR > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> has > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase. > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion > > > > > >>>> where > > > > > >>>>>>> around > > > > > >>>>>>>>> MOB > > > > > >>>>>>>>>>>> that > > > > > >>>>>>>>>>>>>> had > > > > > >>>>>>>>>>>>>>> a > > > > > >>>>>>>>>>>>>>>> MR > > > > > >>>>>>>>>>>>>>>>>> job > > > > > >>>>>>>>>>>>>>>>>>>> to > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compact, > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a > > > > > >>>>>>> non-MR > > > > > >>>>>>>>> job > > > > > >>>>>>>>>> to > > > > > >>>>>>>>>>>> be > > > > > >>>>>>>>>>>>>>>> merged, > > > > > >>>>>>>>>>>>>>>>> I > > > > > >>>>>>>>>>>>>>>>>>>> think > > > > > >>>>>>>>>>>>>>>>>>>>>>>> we > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> had a > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log > > > > > >>>>>>>> split/replay. > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup > > > > > >>>>>> feature > > > > > >>>>>>>>>>>>> (HBASE-7912), > > > > > >>>>>>>>>>>>>>> that > > > > > >>>>>>>>>>>>>>>>>> runs > > > > > >>>>>>>>>>>>>>>>>>> a > > > > > >>>>>>>>>>>>>>>>>>>>>>> MR > > > > > >>>>>>>>>>>>>>>>>>>>>>>> job > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or > > > > > >>>>> restore > > > > > >>>>>>>> data. > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really > > > > > >>>> core" > > > > > >>>>>> as > > > > > >>>>>>>> in.. > > > > > >>>>>>>>>> if > > > > > >>>>>>>>>>>> you > > > > > >>>>>>>>>>>>>>> don't > > > > > >>>>>>>>>>>>>>>>> use > > > > > >>>>>>>>>>>>>>>>>>>>>>> backup > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you'll > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but > > > > > >>>>>> this > > > > > >>>>>>>> was > > > > > >>>>>>>>>>>> probably > > > > > >>>>>>>>>>>>>>> true > > > > > >>>>>>>>>>>>>>>>> for > > > > > >>>>>>>>>>>>>>>>>>> MOB > > > > > >>>>>>>>>>>>>>>>>>>>>>> as > > > > > >>>>>>>>>>>>>>>>>>>>>>>> in > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> "if > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't > > > > > >>>>> need > > > > > >>>>>>>> MR") > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that > > > > > >>>>>> says > > > > > >>>>>>>> "we > > > > > >>>>>>>>>>> don't > > > > > >>>>>>>>>>>>> want > > > > > >>>>>>>>>>>>>>> to > > > > > >>>>>>>>>>>>>>>>> have > > > > > >>>>>>>>>>>>>>>>>>>>>>> hbase > > > > > >>>>>>>>>>>>>>>>>>>>>>>> run > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MR > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started > > > > > >>>> manually > > > > > >>>>> by > > > > > >>>>>>> the > > > > > >>>>>>>>>> user > > > > > >>>>>>>>>>>> can > > > > > >>>>>>>>>>>>> do > > > > > >>>>>>>>>>>>>>>>> that". > > > > > >>>>>>>>>>>>>>>>>> or > > > > > >>>>>>>>>>>>>>>>>>>>>>> can > > > > > >>>>>>>>>>>>>>>>>>>>>>>> we > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without > > > > > >>>>>>>> problems? > > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>> > > > > > >>>>>>>>>>>> > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>> > > > > > >>>>>> > > > > > >>>>> > > > > > >>>> > > > > > >> > > > > > > > > > > > > > > >