If you guys have already implemented the feature in the MR way and the patch is ready for landing on master, I'm a -0 on it as I do not want to block the development progress.
But I strongly suggest later we need to revisit the design and see if we can seperated the logic from HMaster as much as possible. HA is not a big problem if you do not store any metada locally. But the ugly code in HMaster is readlly a problem... And for security, I have a issue pending for a long time. Can someone help taking a simple look at it? This is what I mean, ugly code... logout and destroy the credentials in a subject when it is still being used, and declared as LimitPrivacy so I can not change the behivor and the only way to fix it is to write another piece of ugly code... https://issues.apache.org/jira/browse/HADOOP-13433 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <vladrodio...@gmail.com>: > >> If in the future, we find better ways of doing this without using MR, we > can certainly consider that > > Our framework for distributed operations is abstract and allows > different implementations. MR is just one implementation we provide. > > -Vlad > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <d...@hortonworks.com> wrote: > > > Guys, first off apologies for bringing in the topic of MR-based > > compactions.. But I was thinking more about the SpliceMachine approach of > > managing compactions in Spark where apparently they saw a lot of > benefits. > > Apologies for giving you that sore throat Andrew; I really didn't mean to > > :-) > > > > So on this issue, we have these on the plate: > > 0. Somehow not use MR but something like that > > 1. Run a standalone service other than master > > 2. Shell out from the master > > > > I don't think we have a good answer to (0), and I don't think it's even > > worth the effort of trying to build something when MR is already there, > and > > being used by HBase already for some operations. > > > > On (1), we have to deal with a myriad of issues - HA of the server not > > being the least of them all. Security (kerberos authentication, another > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead > let's > > substitute that (1) with the HBase Master. I haven't seen any good reason > > why the HBase master shouldn't launch MR jobs if needed. It's not ideal; > > agreed. > > > > Now before going to (2), let's see what are the benefits of running the > > backup/restore jobs from the master. I think Ted has summarized some of > the > > issues that we need to take care of - basically, the master can keep > track > > of running jobs, and should it fail, the backup master can continue > keeping > > track of it (since the jobId would have been recorded in the proc WAL). > The > > master can also do cleanup, etc. of failed backup/restore processes. > > Security is another issue - the job needs to run as 'hbase' since it owns > > the data. Having the master launch the job makes it get that privilege. > In > > the (2) approach, it's hard to do some of the above management. > > > > Guys, just to reiterate, the patch as such is ready from the overall > > design/arch point of view (maybe code review is still pending from > Matteo). > > If in the future, we find better ways of doing this without using MR, we > > can certainly consider that. But IMO don't think we should block this > patch > > from getting merged. > > > > ________________________________________ > > From: 张铎 <palomino...@gmail.com> > > Sent: Thursday, September 22, 2016 8:32 PM > > To: dev@hbase.apache.org > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > > > So what about a standalone service other than master? You can use your > own > > procedure store in that service? > > > > 2016-09-23 11:28 GMT+08:00 Ted Yu <yuzhih...@gmail.com>: > > > > > An earlier implementation was client driven. > > > > > > But with that approach, it is hard to resume if there is error midway. > > > Using Procedure V2 makes the backup / restore more robust. > > > > > > Another consideration is for security. It is hard to enforce security > (to > > > be implemented) for client driven actions. > > > > > > Cheers > > > > > > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell < > andrew.purt...@gmail.com> > > > wrote: > > > > > > > > No, this misses Matteo's finer point, which is "shelling out" from > the > > > master directly to run MR is a first. Why not drive this with a utility > > > derived from Tool? > > > > > > > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov < > vladrodio...@gmail.com > > > > > > wrote: > > > > > > > >>>> In our production cluster, it is a common case we just have HDFS > > and > > > >>>> HBase deployed. > > > >>>> If our Master/RS depend on MR framework (especially some features > we > > > >>>> have not used at all), it introduced another cost for maintain. > I > > > >>>> don't think it is a good idea. > > > >> > > > >> So , you are not backup users in this case. Many our customers have > > full > > > >> stack deployed and > > > >> want see backup to be a standard feature. Besides this, nothing will > > > happen > > > >> in your cluster > > > >> if you won't be doing backups. > > > >> > > > >> This discussion (we do not want see M/R dependency) goes to nowhere. > > We > > > >> asked already, at least twice, to suggest another framework (other > > than > > > M/R) > > > >> for bulk data copy with *conversion*. Still waiting for suggestions. > > > >> > > > >> -Vlad > > > >> > > > >> > > > >> > > > >> > > > >>> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu <yuzhih...@gmail.com> > wrote: > > > >>> > > > >>> If MR framework is not deployed in the cluster, hbase still > functions > > > >>> normally (post merge). > > > >>> > > > >>> In terms of build time dependency, we have long been depending on > > > >>> mapreduce. Take a look at ExportSnapshot. > > > >>> > > > >>> Cheers > > > >>> > > > >>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen < > heng.chen.1...@gmail.com > > > > > > >>> wrote: > > > >>> > > > >>>> In our production cluster, it is a common case we just have HDFS > > and > > > >>>> HBase deployed. > > > >>>> If our Master/RS depend on MR framework (especially some features > we > > > >>>> have not used at all), it introduced another cost for maintain. > I > > > >>>> don't think it is a good idea. > > > >>>> > > > >>>> 2016-09-23 10:28 GMT+08:00 张铎 <palomino...@gmail.com>: > > > >>>>> To be specific, for example, our nice Backup/Restore feature, if > we > > > >>> think > > > >>>>> this is not a core feature of HBase, then we could make it depend > > on > > > >>> MR, > > > >>>>> and start a standalone BackupManager instance that submits MR > jobs > > to > > > >>> do > > > >>>>> periodical maintenance job. And if we think this is a core > feature > > > that > > > >>>>> everyone should use it, then we'd better implement it without MR > > > >>>>> dependency, like DLS. > > > >>>>> > > > >>>>> Thanks. > > > >>>>> > > > >>>>> 2016-09-23 10:11 GMT+08:00 张铎 <palomino...@gmail.com>: > > > >>>>> > > > >>>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of > > our > > > >>>>>> features depend on MR but I think the bottom line is that we > > should > > > >>>> launch > > > >>>>>> the jobs from outside manually or by other services. > > > >>>>>> > > > >>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell < > > andrew.purt...@gmail.com > > > >: > > > >>>>>> > > > >>>>>>> Ok, got it. Well "shelling out" is on the line I think, so a > fair > > > >>>>>>> question. > > > >>>>>>> > > > >>>>>>> Can this be driven by a utility derived from Tool like our > other > > MR > > > >>>> apps? > > > >>>>>>> The issue is needing the AccessController to decide if allowed? > > But > > > >>>> nothing > > > >>>>>>> prevents the user from running the job manually/independently, > > > right? > > > >>>>>>> > > > >>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi < > > > >>>> theo.berto...@gmail.com> > > > >>>>>>> wrote: > > > >>>>>>>> > > > >>>>>>>> just a remark. my query was not about tools using MR > (everyone i > > > >>>> think > > > >>>>>>> is > > > >>>>>>>> ok with those). > > > >>>>>>>> the topic was about: "are we ok with running MR jobs from > Master > > > >>> and > > > >>>> RSs > > > >>>>>>>> code?" since this will be the first time we do this > > > >>>>>>>> > > > >>>>>>>> Matteo > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das < > > > >>> d...@hortonworks.com> > > > >>>>>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>> Very much agree; for tools like ExportSnapshot / Backup / > > > Restore, > > > >>>> it's > > > >>>>>>>>> fine to be dependent on MR. MR is the right framework for > such. > > > We > > > >>>>>>> should > > > >>>>>>>>> also do compactions using MR (just saying :) ) > > > >>>>>>>>> ________________________________________ > > > >>>>>>>>> From: Ted Yu <yuzhih...@gmail.com> > > > >>>>>>>>> Sent: Thursday, September 22, 2016 2:00 PM > > > >>>>>>>>> To: dev@hbase.apache.org > > > >>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > > >>>>>>>>> > > > >>>>>>>>> I agree - backup / restore is in the same category as import > / > > > >>>> export. > > > >>>>>>>>> > > > >>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < > > > >>>>>>> andrew.purt...@gmail.com> > > > >>>>>>>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>>> Backup is extra tooling around core in my opinion. Like > import > > > or > > > >>>>>>> export. > > > >>>>>>>>>> Or the optional MOB tool. It's fine. > > > >>>>>>>>>> > > > >>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi < > > > >>>> mberto...@apache.org> > > > >>>>>>>>>> wrote: > > > >>>>>>>>>>> > > > >>>>>>>>>>> What's the latest opinion around running MR jobs from hbase > > > >>>> (Master > > > >>>>>>> or > > > >>>>>>>>>> RS)? > > > >>>>>>>>>>> > > > >>>>>>>>>>> I remember in the past that there was discussion about not > > > >>> having > > > >>>> MR > > > >>>>>>>>> has > > > >>>>>>>>>>> direct dependency of hbase. > > > >>>>>>>>>>> > > > >>>>>>>>>>> I think some of discussion where around MOB that had a MR > job > > > to > > > >>>>>>>>> compact, > > > >>>>>>>>>>> that later was transformed in a non-MR job to be merged, I > > > think > > > >>>> we > > > >>>>>>>>> had a > > > >>>>>>>>>>> similar discussion for log split/replay. > > > >>>>>>>>>>> > > > >>>>>>>>>>> the latest is the new Backup feature (HBASE-7912), that > runs > > a > > > >>> MR > > > >>>> job > > > >>>>>>>>>> from > > > >>>>>>>>>>> the master to copy data or restore data. > > > >>>>>>>>>>> (backup is also "not really core" as in.. if you don't use > > > >>> backup > > > >>>>>>>>> you'll > > > >>>>>>>>>>> not end up running MR jobs, but this was probably true for > > MOB > > > >>> as > > > >>>> in > > > >>>>>>>>> "if > > > >>>>>>>>>>> you don't enable MOB you don't need MR") > > > >>>>>>>>>>> > > > >>>>>>>>>>> any thoughts? do we a rule that says "we don't want to have > > > >>> hbase > > > >>>> run > > > >>>>>>>>> MR > > > >>>>>>>>>>> jobs, only tool started manually by the user can do that". > or > > > >>> can > > > >>>> we > > > >>>>>>>>>> start > > > >>>>>>>>>>> adding MR calls around without problems? > > > >>> > > > > > >