An earlier implementation was client driven. But with that approach, it is hard to resume if there is error midway. Using Procedure V2 makes the backup / restore more robust.
Another consideration is for security. It is hard to enforce security (to be implemented) for client driven actions. Cheers > On Sep 22, 2016, at 8:15 PM, Andrew Purtell <andrew.purt...@gmail.com> wrote: > > No, this misses Matteo's finer point, which is "shelling out" from the master > directly to run MR is a first. Why not drive this with a utility derived from > Tool? > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov <vladrodio...@gmail.com> wrote: > >>>> In our production cluster, it is a common case we just have HDFS and >>>> HBase deployed. >>>> If our Master/RS depend on MR framework (especially some features we >>>> have not used at all), it introduced another cost for maintain. I >>>> don't think it is a good idea. >> >> So , you are not backup users in this case. Many our customers have full >> stack deployed and >> want see backup to be a standard feature. Besides this, nothing will happen >> in your cluster >> if you won't be doing backups. >> >> This discussion (we do not want see M/R dependency) goes to nowhere. We >> asked already, at least twice, to suggest another framework (other than M/R) >> for bulk data copy with *conversion*. Still waiting for suggestions. >> >> -Vlad >> >> >> >> >>> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>> >>> If MR framework is not deployed in the cluster, hbase still functions >>> normally (post merge). >>> >>> In terms of build time dependency, we have long been depending on >>> mapreduce. Take a look at ExportSnapshot. >>> >>> Cheers >>> >>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen <heng.chen.1...@gmail.com> >>> wrote: >>> >>>> In our production cluster, it is a common case we just have HDFS and >>>> HBase deployed. >>>> If our Master/RS depend on MR framework (especially some features we >>>> have not used at all), it introduced another cost for maintain. I >>>> don't think it is a good idea. >>>> >>>> 2016-09-23 10:28 GMT+08:00 张铎 <palomino...@gmail.com>: >>>>> To be specific, for example, our nice Backup/Restore feature, if we >>> think >>>>> this is not a core feature of HBase, then we could make it depend on >>> MR, >>>>> and start a standalone BackupManager instance that submits MR jobs to >>> do >>>>> periodical maintenance job. And if we think this is a core feature that >>>>> everyone should use it, then we'd better implement it without MR >>>>> dependency, like DLS. >>>>> >>>>> Thanks. >>>>> >>>>> 2016-09-23 10:11 GMT+08:00 张铎 <palomino...@gmail.com>: >>>>> >>>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our >>>>>> features depend on MR but I think the bottom line is that we should >>>> launch >>>>>> the jobs from outside manually or by other services. >>>>>> >>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell <andrew.purt...@gmail.com>: >>>>>> >>>>>>> Ok, got it. Well "shelling out" is on the line I think, so a fair >>>>>>> question. >>>>>>> >>>>>>> Can this be driven by a utility derived from Tool like our other MR >>>> apps? >>>>>>> The issue is needing the AccessController to decide if allowed? But >>>> nothing >>>>>>> prevents the user from running the job manually/independently, right? >>>>>>> >>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi < >>>> theo.berto...@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> just a remark. my query was not about tools using MR (everyone i >>>> think >>>>>>> is >>>>>>>> ok with those). >>>>>>>> the topic was about: "are we ok with running MR jobs from Master >>> and >>>> RSs >>>>>>>> code?" since this will be the first time we do this >>>>>>>> >>>>>>>> Matteo >>>>>>>> >>>>>>>> >>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das < >>> d...@hortonworks.com> >>>>>>> wrote: >>>>>>>>> >>>>>>>>> Very much agree; for tools like ExportSnapshot / Backup / Restore, >>>> it's >>>>>>>>> fine to be dependent on MR. MR is the right framework for such. We >>>>>>> should >>>>>>>>> also do compactions using MR (just saying :) ) >>>>>>>>> ________________________________________ >>>>>>>>> From: Ted Yu <yuzhih...@gmail.com> >>>>>>>>> Sent: Thursday, September 22, 2016 2:00 PM >>>>>>>>> To: dev@hbase.apache.org >>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by Master or RS >>>>>>>>> >>>>>>>>> I agree - backup / restore is in the same category as import / >>>> export. >>>>>>>>> >>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < >>>>>>> andrew.purt...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Backup is extra tooling around core in my opinion. Like import or >>>>>>> export. >>>>>>>>>> Or the optional MOB tool. It's fine. >>>>>>>>>> >>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi < >>>> mberto...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> What's the latest opinion around running MR jobs from hbase >>>> (Master >>>>>>> or >>>>>>>>>> RS)? >>>>>>>>>>> >>>>>>>>>>> I remember in the past that there was discussion about not >>> having >>>> MR >>>>>>>>> has >>>>>>>>>>> direct dependency of hbase. >>>>>>>>>>> >>>>>>>>>>> I think some of discussion where around MOB that had a MR job to >>>>>>>>> compact, >>>>>>>>>>> that later was transformed in a non-MR job to be merged, I think >>>> we >>>>>>>>> had a >>>>>>>>>>> similar discussion for log split/replay. >>>>>>>>>>> >>>>>>>>>>> the latest is the new Backup feature (HBASE-7912), that runs a >>> MR >>>> job >>>>>>>>>> from >>>>>>>>>>> the master to copy data or restore data. >>>>>>>>>>> (backup is also "not really core" as in.. if you don't use >>> backup >>>>>>>>> you'll >>>>>>>>>>> not end up running MR jobs, but this was probably true for MOB >>> as >>>> in >>>>>>>>> "if >>>>>>>>>>> you don't enable MOB you don't need MR") >>>>>>>>>>> >>>>>>>>>>> any thoughts? do we a rule that says "we don't want to have >>> hbase >>>> run >>>>>>>>> MR >>>>>>>>>>> jobs, only tool started manually by the user can do that". or >>> can >>>> we >>>>>>>>>> start >>>>>>>>>>> adding MR calls around without problems? >>>