2016-09-23 12:38 GMT+08:00 Devaraj Das <d...@hortonworks.com>:

> Guys, first off apologies for bringing in the topic of MR-based
> compactions.. But I was thinking more about the SpliceMachine approach of
> managing compactions in Spark where apparently they saw a lot of benefits.
> Apologies for giving you that sore throat Andrew; I really didn't mean to
> :-)
>
> So on this issue, we have these on the plate:
> 0. Somehow not use MR but something like that
> 1. Run a standalone service other than master
> 2. Shell out from the master
>
> I don't think we have a good answer to (0), and I don't think it's even
> worth the effort of trying to build something when MR is already there, and
> being used by HBase already for some operations.
>
> On (1), we have to deal with a myriad of issues - HA of the server not
> being the least of them all. Security (kerberos authentication, another
> keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead let's
> substitute that (1) with the HBase Master. I haven't seen any good reason
> why the HBase master shouldn't launch MR jobs if needed. It's not ideal;
> agreed.
>
> We have already put lots of stuffs in HMaster, especially on startup, we
even need to run the initializations in a background thread to let the
master get up quickly and causes lots of races. I do not think it is a good
idea to keep messing up the code there.
For MR, as I said above, configuration. I do not want to restart HMaster if
I just want to tune the config of a backup MR job. Yes, you could introduce
new shell commands to do that, change job config, change YARN cluster, and
persist the change to some places, maybe zookeeper? Oh no...


> Now before going to (2), let's see what are the benefits of running the
> backup/restore jobs from the master. I think Ted has summarized some of the
> issues that we need to take care of - basically, the master can keep track
> of running jobs, and should it fail, the backup master can continue keeping
> track of it (since the jobId would have been recorded in the proc WAL). The
> master can also do cleanup, etc. of failed backup/restore processes.
> Security is another issue - the job needs to run as 'hbase' since it owns
> the data. Having the master launch the job makes it get that privilege. In
> the (2) approach, it's hard to do some of the above management.
>
> Guys, just to reiterate, the patch as such is ready from the overall
> design/arch point of view (maybe code review is still pending from Matteo).
> If in the future, we find better ways of doing this without using MR, we
> can certainly consider that. But IMO don't think we should block this patch
> from getting merged.
>
> ________________________________________
> From: 张铎 <palomino...@gmail.com>
> Sent: Thursday, September 22, 2016 8:32 PM
> To: dev@hbase.apache.org
> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>
> So what about a standalone service other than master? You can use your own
> procedure store in that service?
>
> 2016-09-23 11:28 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:
>
> > An earlier implementation was client driven.
> >
> > But with that approach, it is hard to resume if there is error midway.
> > Using Procedure V2 makes the backup / restore more robust.
> >
> > Another consideration is for security. It is hard to enforce security (to
> > be implemented) for client driven actions.
> >
> > Cheers
> >
> > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell <andrew.purt...@gmail.com>
> > wrote:
> > >
> > > No, this misses Matteo's finer point, which is "shelling out" from the
> > master directly to run MR is a first. Why not drive this with a utility
> > derived from Tool?
> > >
> > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov <vladrodio...@gmail.com
> >
> > wrote:
> > >
> > >>>> In our production cluster,  it is a common case we just have HDFS
> and
> > >>>> HBase deployed.
> > >>>> If our Master/RS depend on MR framework (especially some features we
> > >>>> have not used at all),  it introduced another cost for maintain.  I
> > >>>> don't think it is a good idea.
> > >>
> > >> So , you are not backup users in this case. Many our customers have
> full
> > >> stack deployed and
> > >> want see backup to be a standard feature. Besides this, nothing will
> > happen
> > >> in your cluster
> > >> if you won't be doing backups.
> > >>
> > >> This discussion (we do not want see M/R dependency) goes to nowhere.
> We
> > >> asked already, at least twice, to suggest another framework (other
> than
> > M/R)
> > >> for bulk data copy with *conversion*. Still waiting for suggestions.
> > >>
> > >> -Vlad
> > >>
> > >>
> > >>
> > >>
> > >>> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >>>
> > >>> If MR framework is not deployed in the cluster, hbase still functions
> > >>> normally (post merge).
> > >>>
> > >>> In terms of build time dependency, we have long been depending on
> > >>> mapreduce. Take a look at ExportSnapshot.
> > >>>
> > >>> Cheers
> > >>>
> > >>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen <heng.chen.1...@gmail.com
> >
> > >>> wrote:
> > >>>
> > >>>> In our production cluster,  it is a common case we just have HDFS
> and
> > >>>> HBase deployed.
> > >>>> If our Master/RS depend on MR framework (especially some features we
> > >>>> have not used at all),  it introduced another cost for maintain.  I
> > >>>> don't think it is a good idea.
> > >>>>
> > >>>> 2016-09-23 10:28 GMT+08:00 张铎 <palomino...@gmail.com>:
> > >>>>> To be specific, for example, our nice Backup/Restore feature, if we
> > >>> think
> > >>>>> this is not a core feature of HBase, then we could make it depend
> on
> > >>> MR,
> > >>>>> and start a standalone BackupManager instance that submits MR jobs
> to
> > >>> do
> > >>>>> periodical maintenance job. And if we think this is a core feature
> > that
> > >>>>> everyone should use it, then we'd better implement it without MR
> > >>>>> dependency, like DLS.
> > >>>>>
> > >>>>> Thanks.
> > >>>>>
> > >>>>> 2016-09-23 10:11 GMT+08:00 张铎 <palomino...@gmail.com>:
> > >>>>>
> > >>>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of
> our
> > >>>>>> features depend on MR but I think the bottom line is that we
> should
> > >>>> launch
> > >>>>>> the jobs from outside manually or by other services.
> > >>>>>>
> > >>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell <
> andrew.purt...@gmail.com
> > >:
> > >>>>>>
> > >>>>>>> Ok, got it. Well "shelling out" is on the line I think, so a fair
> > >>>>>>> question.
> > >>>>>>>
> > >>>>>>> Can this be driven by a utility derived from Tool like our other
> MR
> > >>>> apps?
> > >>>>>>> The issue is needing the AccessController to decide if allowed?
> But
> > >>>> nothing
> > >>>>>>> prevents the user from running the job manually/independently,
> > right?
> > >>>>>>>
> > >>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
> > >>>> theo.berto...@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> just a remark. my query was not about tools using MR (everyone i
> > >>>> think
> > >>>>>>> is
> > >>>>>>>> ok with those).
> > >>>>>>>> the topic was about: "are we ok with running MR jobs from Master
> > >>> and
> > >>>> RSs
> > >>>>>>>> code?" since this will be the first time we do this
> > >>>>>>>>
> > >>>>>>>> Matteo
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das <
> > >>> d...@hortonworks.com>
> > >>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Very much agree; for tools like ExportSnapshot / Backup /
> > Restore,
> > >>>> it's
> > >>>>>>>>> fine to be dependent on MR. MR is the right framework for such.
> > We
> > >>>>>>> should
> > >>>>>>>>> also do compactions using MR (just saying :) )
> > >>>>>>>>> ________________________________________
> > >>>>>>>>> From: Ted Yu <yuzhih...@gmail.com>
> > >>>>>>>>> Sent: Thursday, September 22, 2016 2:00 PM
> > >>>>>>>>> To: dev@hbase.apache.org
> > >>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> > >>>>>>>>>
> > >>>>>>>>> I agree - backup / restore is in the same category as import /
> > >>>> export.
> > >>>>>>>>>
> > >>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
> > >>>>>>> andrew.purt...@gmail.com>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Backup is extra tooling around core in my opinion. Like import
> > or
> > >>>>>>> export.
> > >>>>>>>>>> Or the optional MOB tool. It's fine.
> > >>>>>>>>>>
> > >>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi <
> > >>>> mberto...@apache.org>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> What's the latest opinion around running MR jobs from hbase
> > >>>> (Master
> > >>>>>>> or
> > >>>>>>>>>> RS)?
> > >>>>>>>>>>>
> > >>>>>>>>>>> I remember in the past that there was discussion about not
> > >>> having
> > >>>> MR
> > >>>>>>>>> has
> > >>>>>>>>>>> direct dependency of hbase.
> > >>>>>>>>>>>
> > >>>>>>>>>>> I think some of discussion where around MOB that had a MR job
> > to
> > >>>>>>>>> compact,
> > >>>>>>>>>>> that later was transformed in a non-MR job to be merged, I
> > think
> > >>>> we
> > >>>>>>>>> had a
> > >>>>>>>>>>> similar discussion for log split/replay.
> > >>>>>>>>>>>
> > >>>>>>>>>>> the latest is the new Backup feature (HBASE-7912), that runs
> a
> > >>> MR
> > >>>> job
> > >>>>>>>>>> from
> > >>>>>>>>>>> the master to copy data or restore data.
> > >>>>>>>>>>> (backup is also "not really core" as in.. if you don't use
> > >>> backup
> > >>>>>>>>> you'll
> > >>>>>>>>>>> not end up running MR jobs, but this was probably true for
> MOB
> > >>> as
> > >>>> in
> > >>>>>>>>> "if
> > >>>>>>>>>>> you don't enable MOB you don't need MR")
> > >>>>>>>>>>>
> > >>>>>>>>>>> any thoughts? do we a rule that says "we don't want to have
> > >>> hbase
> > >>>> run
> > >>>>>>>>> MR
> > >>>>>>>>>>> jobs, only tool started manually by the user can do that". or
> > >>> can
> > >>>> we
> > >>>>>>>>>> start
> > >>>>>>>>>>> adding MR calls around without problems?
> > >>>
> >
>

Reply via email to