That is the point, Matteo.

Put it another way, there is nothing that stops a user from deploying
custom procedure, custom co-processor that calls out MR job.
The optional feature should satisfy some basic requirements. .e.g. No
impact if not deployed or used.  Limited impact if used.
It can be made with isolated dynamic loading of extra configuration (Yarn),
non-blocking non-occupying on the server handlers, or separate handler.
The impact would mostly be on the overall cluster resources. In this sense,
there is no difference, using another standalone server or a command tool.
The exportEnapshot can then be moved to the server as well.

Also, thinking about in the higher level.  It is probably beneficial if you
allow HBase to call out an external framework to do computation. It can be
think of as a UDF, a distributed UDF.
The execution of this UDF is totally in separate address spaces, and you
only need to poll the status.  This would be like a dream in traditional
database.

My 2 cents.

Jerry


On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi <theo.berto...@gmail.com>
wrote:

> let me try to go back to my original topic.
> this question was meant to be generic, and provide some rule for future
> code.
>
> from what I can gather, a rule that may satisfy everyone can be:
>  - we don't want any core feature (e.g. compaction/log-split/log-reply)
> over MR, because some cluster may not want or may have an
> external/uncontrolled MR setup.
>  - we allow non-core features (e.g. features enabled by a flag) to run MR
> jobs from hbase, because unless you use the feature, MR is not required.
>
> Matteo
>
>
> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > I suggest you look at Matteo's work for AssignmentManager which is to
> make
> > Master more stable.
> >
> > Cheers
> >
> > On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <palomino...@gmail.com> wrote:
> >
> > > No, not your fault, at lease, not this time:)
> > >
> > > Why I call the code ugly? Can you simply tell me the sequence of calls
> > when
> > > starting up the HMaster? HMaster is also a regionserver so it extends
> > > HRegionServer, and the initialization of HRegionServer sometimes needs
> to
> > > make rpc calls to HMaster. A simple change would cause probabilistic
> dead
> > > lock or some strange NPEs...
> > >
> > > That's why I'm very nervous when somebody wants to add new features or
> > add
> > > external dependencies to HMaster, especially add more works for the
> start
> > > up processing...
> > >
> > > Thanks.
> > >
> > > 2016-09-23 20:02 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:
> > >
> > > > I read through HADOOP-13433
> > > > <https://issues.apache.org/jira/browse/HADOOP-13433> - the cited
> race
> > > > condition is in jdk.
> > > >
> > > > Suggest pinging the reviewer on JIRA to get it moving.
> > > >
> > > > bq. But the ugly code in HMaster is readlly a problem...
> > > >
> > > > Can you be specific as to which code is ugly ? Is it in the backup /
> > > > restore mega patch ?
> > > >
> > > > Cheers
> > > >
> > > > On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <palomino...@gmail.com> wrote:
> > > >
> > > > > If you guys have already implemented the feature in the MR way and
> > the
> > > > > patch is ready for landing on master, I'm a -0 on it as I do not
> want
> > > to
> > > > > block the development progress.
> > > > >
> > > > > But I strongly suggest later we need to revisit the design and see
> if
> > > we
> > > > > can seperated the logic from HMaster as much as possible. HA is
> not a
> > > big
> > > > > problem if you do not store any metada locally. But the ugly code
> in
> > > > > HMaster is readlly a problem...
> > > > >
> > > > > And for security, I have a issue pending for a long time. Can
> someone
> > > > help
> > > > > taking a simple look at it? This is what I mean, ugly code...
> logout
> > > and
> > > > > destroy the credentials in a subject when it is still being used,
> and
> > > > > declared as LimitPrivacy so I can not change the behivor and the
> only
> > > way
> > > > > to fix it is to write another piece of ugly code...
> > > > >
> > > > > https://issues.apache.org/jira/browse/HADOOP-13433
> > > > >
> > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> vladrodio...@gmail.com
> > >:
> > > > >
> > > > > > >> If in the future, we find better ways of doing this without
> > using
> > > > MR,
> > > > > we
> > > > > > can certainly consider that
> > > > > >
> > > > > > Our framework for distributed operations is abstract and allows
> > > > > > different implementations. MR is just one implementation we
> > provide.
> > > > > >
> > > > > > -Vlad
> > > > > >
> > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> d...@hortonworks.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Guys, first off apologies for bringing in the topic of MR-based
> > > > > > > compactions.. But I was thinking more about the SpliceMachine
> > > > approach
> > > > > of
> > > > > > > managing compactions in Spark where apparently they saw a lot
> of
> > > > > > benefits.
> > > > > > > Apologies for giving you that sore throat Andrew; I really
> didn't
> > > > mean
> > > > > to
> > > > > > > :-)
> > > > > > >
> > > > > > > So on this issue, we have these on the plate:
> > > > > > > 0. Somehow not use MR but something like that
> > > > > > > 1. Run a standalone service other than master
> > > > > > > 2. Shell out from the master
> > > > > > >
> > > > > > > I don't think we have a good answer to (0), and I don't think
> > it's
> > > > even
> > > > > > > worth the effort of trying to build something when MR is
> already
> > > > there,
> > > > > > and
> > > > > > > being used by HBase already for some operations.
> > > > > > >
> > > > > > > On (1), we have to deal with a myriad of issues - HA of the
> > server
> > > > not
> > > > > > > being the least of them all. Security (kerberos authentication,
> > > > another
> > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA.
> > > Instead
> > > > > > let's
> > > > > > > substitute that (1) with the HBase Master. I haven't seen any
> > good
> > > > > reason
> > > > > > > why the HBase master shouldn't launch MR jobs if needed. It's
> not
> > > > > ideal;
> > > > > > > agreed.
> > > > > > >
> > > > > > > Now before going to (2), let's see what are the benefits of
> > running
> > > > the
> > > > > > > backup/restore jobs from the master. I think Ted has summarized
> > > some
> > > > of
> > > > > > the
> > > > > > > issues that we need to take care of - basically, the master can
> > > keep
> > > > > > track
> > > > > > > of running jobs, and should it fail, the backup master can
> > continue
> > > > > > keeping
> > > > > > > track of it (since the jobId would have been recorded in the
> proc
> > > > WAL).
> > > > > > The
> > > > > > > master can also do cleanup, etc. of failed backup/restore
> > > processes.
> > > > > > > Security is another issue - the job needs to run as 'hbase'
> since
> > > it
> > > > > owns
> > > > > > > the data. Having the master launch the job makes it get that
> > > > privilege.
> > > > > > In
> > > > > > > the (2) approach, it's hard to do some of the above management.
> > > > > > >
> > > > > > > Guys, just to reiterate, the patch as such is ready from the
> > > overall
> > > > > > > design/arch point of view (maybe code review is still pending
> > from
> > > > > > Matteo).
> > > > > > > If in the future, we find better ways of doing this without
> using
> > > MR,
> > > > > we
> > > > > > > can certainly consider that. But IMO don't think we should
> block
> > > this
> > > > > > patch
> > > > > > > from getting merged.
> > > > > > >
> > > > > > > ________________________________________
> > > > > > > From: 张铎 <palomino...@gmail.com>
> > > > > > > Sent: Thursday, September 22, 2016 8:32 PM
> > > > > > > To: dev@hbase.apache.org
> > > > > > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> > > > > > >
> > > > > > > So what about a standalone service other than master? You can
> use
> > > > your
> > > > > > own
> > > > > > > procedure store in that service?
> > > > > > >
> > > > > > > 2016-09-23 11:28 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:
> > > > > > >
> > > > > > > > An earlier implementation was client driven.
> > > > > > > >
> > > > > > > > But with that approach, it is hard to resume if there is
> error
> > > > > midway.
> > > > > > > > Using Procedure V2 makes the backup / restore more robust.
> > > > > > > >
> > > > > > > > Another consideration is for security. It is hard to enforce
> > > > security
> > > > > > (to
> > > > > > > > be implemented) for client driven actions.
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell <
> > > > > > andrew.purt...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > No, this misses Matteo's finer point, which is "shelling
> out"
> > > > from
> > > > > > the
> > > > > > > > master directly to run MR is a first. Why not drive this
> with a
> > > > > utility
> > > > > > > > derived from Tool?
> > > > > > > > >
> > > > > > > > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov <
> > > > > > vladrodio...@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >>>> In our production cluster,  it is a common case we just
> > have
> > > > > HDFS
> > > > > > > and
> > > > > > > > >>>> HBase deployed.
> > > > > > > > >>>> If our Master/RS depend on MR framework (especially some
> > > > > features
> > > > > > we
> > > > > > > > >>>> have not used at all),  it introduced another cost for
> > > > maintain.
> > > > > > I
> > > > > > > > >>>> don't think it is a good idea.
> > > > > > > > >>
> > > > > > > > >> So , you are not backup users in this case. Many our
> > customers
> > > > > have
> > > > > > > full
> > > > > > > > >> stack deployed and
> > > > > > > > >> want see backup to be a standard feature. Besides this,
> > > nothing
> > > > > will
> > > > > > > > happen
> > > > > > > > >> in your cluster
> > > > > > > > >> if you won't be doing backups.
> > > > > > > > >>
> > > > > > > > >> This discussion (we do not want see M/R dependency) goes
> to
> > > > > nowhere.
> > > > > > > We
> > > > > > > > >> asked already, at least twice, to suggest another
> framework
> > > > (other
> > > > > > > than
> > > > > > > > M/R)
> > > > > > > > >> for bulk data copy with *conversion*. Still waiting for
> > > > > suggestions.
> > > > > > > > >>
> > > > > > > > >> -Vlad
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu <
> > yuzhih...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >>>
> > > > > > > > >>> If MR framework is not deployed in the cluster, hbase
> still
> > > > > > functions
> > > > > > > > >>> normally (post merge).
> > > > > > > > >>>
> > > > > > > > >>> In terms of build time dependency, we have long been
> > > depending
> > > > on
> > > > > > > > >>> mapreduce. Take a look at ExportSnapshot.
> > > > > > > > >>>
> > > > > > > > >>> Cheers
> > > > > > > > >>>
> > > > > > > > >>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen <
> > > > > > heng.chen.1...@gmail.com
> > > > > > > >
> > > > > > > > >>> wrote:
> > > > > > > > >>>
> > > > > > > > >>>> In our production cluster,  it is a common case we just
> > have
> > > > > HDFS
> > > > > > > and
> > > > > > > > >>>> HBase deployed.
> > > > > > > > >>>> If our Master/RS depend on MR framework (especially some
> > > > > features
> > > > > > we
> > > > > > > > >>>> have not used at all),  it introduced another cost for
> > > > maintain.
> > > > > > I
> > > > > > > > >>>> don't think it is a good idea.
> > > > > > > > >>>>
> > > > > > > > >>>> 2016-09-23 10:28 GMT+08:00 张铎 <palomino...@gmail.com>:
> > > > > > > > >>>>> To be specific, for example, our nice Backup/Restore
> > > feature,
> > > > > if
> > > > > > we
> > > > > > > > >>> think
> > > > > > > > >>>>> this is not a core feature of HBase, then we could make
> > it
> > > > > depend
> > > > > > > on
> > > > > > > > >>> MR,
> > > > > > > > >>>>> and start a standalone BackupManager instance that
> > submits
> > > MR
> > > > > > jobs
> > > > > > > to
> > > > > > > > >>> do
> > > > > > > > >>>>> periodical maintenance job. And if we think this is a
> > core
> > > > > > feature
> > > > > > > > that
> > > > > > > > >>>>> everyone should use it, then we'd better implement it
> > > without
> > > > > MR
> > > > > > > > >>>>> dependency, like DLS.
> > > > > > > > >>>>>
> > > > > > > > >>>>> Thanks.
> > > > > > > > >>>>>
> > > > > > > > >>>>> 2016-09-23 10:11 GMT+08:00 张铎 <palomino...@gmail.com>:
> > > > > > > > >>>>>
> > > > > > > > >>>>>> I‘m -1 on let master or rs launch MR jobs. It is OK
> that
> > > > some
> > > > > of
> > > > > > > our
> > > > > > > > >>>>>> features depend on MR but I think the bottom line is
> > that
> > > we
> > > > > > > should
> > > > > > > > >>>> launch
> > > > > > > > >>>>>> the jobs from outside manually or by other services.
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell <
> > > > > > > andrew.purt...@gmail.com
> > > > > > > > >:
> > > > > > > > >>>>>>
> > > > > > > > >>>>>>> Ok, got it. Well "shelling out" is on the line I
> think,
> > > so
> > > > a
> > > > > > fair
> > > > > > > > >>>>>>> question.
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> Can this be driven by a utility derived from Tool
> like
> > > our
> > > > > > other
> > > > > > > MR
> > > > > > > > >>>> apps?
> > > > > > > > >>>>>>> The issue is needing the AccessController to decide
> if
> > > > > allowed?
> > > > > > > But
> > > > > > > > >>>> nothing
> > > > > > > > >>>>>>> prevents the user from running the job
> > > > > manually/independently,
> > > > > > > > right?
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
> > > > > > > > >>>> theo.berto...@gmail.com>
> > > > > > > > >>>>>>> wrote:
> > > > > > > > >>>>>>>>
> > > > > > > > >>>>>>>> just a remark. my query was not about tools using MR
> > > > > > (everyone i
> > > > > > > > >>>> think
> > > > > > > > >>>>>>> is
> > > > > > > > >>>>>>>> ok with those).
> > > > > > > > >>>>>>>> the topic was about: "are we ok with running MR jobs
> > > from
> > > > > > Master
> > > > > > > > >>> and
> > > > > > > > >>>> RSs
> > > > > > > > >>>>>>>> code?" since this will be the first time we do this
> > > > > > > > >>>>>>>>
> > > > > > > > >>>>>>>> Matteo
> > > > > > > > >>>>>>>>
> > > > > > > > >>>>>>>>
> > > > > > > > >>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das <
> > > > > > > > >>> d...@hortonworks.com>
> > > > > > > > >>>>>>> wrote:
> > > > > > > > >>>>>>>>>
> > > > > > > > >>>>>>>>> Very much agree; for tools like ExportSnapshot /
> > > Backup /
> > > > > > > > Restore,
> > > > > > > > >>>> it's
> > > > > > > > >>>>>>>>> fine to be dependent on MR. MR is the right
> framework
> > > for
> > > > > > such.
> > > > > > > > We
> > > > > > > > >>>>>>> should
> > > > > > > > >>>>>>>>> also do compactions using MR (just saying :) )
> > > > > > > > >>>>>>>>> ________________________________________
> > > > > > > > >>>>>>>>> From: Ted Yu <yuzhih...@gmail.com>
> > > > > > > > >>>>>>>>> Sent: Thursday, September 22, 2016 2:00 PM
> > > > > > > > >>>>>>>>> To: dev@hbase.apache.org
> > > > > > > > >>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by Master
> > or
> > > RS
> > > > > > > > >>>>>>>>>
> > > > > > > > >>>>>>>>> I agree - backup / restore is in the same category
> as
> > > > > import
> > > > > > /
> > > > > > > > >>>> export.
> > > > > > > > >>>>>>>>>
> > > > > > > > >>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
> > > > > > > > >>>>>>> andrew.purt...@gmail.com>
> > > > > > > > >>>>>>>>> wrote:
> > > > > > > > >>>>>>>>>
> > > > > > > > >>>>>>>>>> Backup is extra tooling around core in my opinion.
> > > Like
> > > > > > import
> > > > > > > > or
> > > > > > > > >>>>>>> export.
> > > > > > > > >>>>>>>>>> Or the optional MOB tool. It's fine.
> > > > > > > > >>>>>>>>>>
> > > > > > > > >>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi <
> > > > > > > > >>>> mberto...@apache.org>
> > > > > > > > >>>>>>>>>> wrote:
> > > > > > > > >>>>>>>>>>>
> > > > > > > > >>>>>>>>>>> What's the latest opinion around running MR jobs
> > from
> > > > > hbase
> > > > > > > > >>>> (Master
> > > > > > > > >>>>>>> or
> > > > > > > > >>>>>>>>>> RS)?
> > > > > > > > >>>>>>>>>>>
> > > > > > > > >>>>>>>>>>> I remember in the past that there was discussion
> > > about
> > > > > not
> > > > > > > > >>> having
> > > > > > > > >>>> MR
> > > > > > > > >>>>>>>>> has
> > > > > > > > >>>>>>>>>>> direct dependency of hbase.
> > > > > > > > >>>>>>>>>>>
> > > > > > > > >>>>>>>>>>> I think some of discussion where around MOB that
> > had
> > > a
> > > > MR
> > > > > > job
> > > > > > > > to
> > > > > > > > >>>>>>>>> compact,
> > > > > > > > >>>>>>>>>>> that later was transformed in a non-MR job to be
> > > > merged,
> > > > > I
> > > > > > > > think
> > > > > > > > >>>> we
> > > > > > > > >>>>>>>>> had a
> > > > > > > > >>>>>>>>>>> similar discussion for log split/replay.
> > > > > > > > >>>>>>>>>>>
> > > > > > > > >>>>>>>>>>> the latest is the new Backup feature
> (HBASE-7912),
> > > that
> > > > > > runs
> > > > > > > a
> > > > > > > > >>> MR
> > > > > > > > >>>> job
> > > > > > > > >>>>>>>>>> from
> > > > > > > > >>>>>>>>>>> the master to copy data or restore data.
> > > > > > > > >>>>>>>>>>> (backup is also "not really core" as in.. if you
> > > don't
> > > > > use
> > > > > > > > >>> backup
> > > > > > > > >>>>>>>>> you'll
> > > > > > > > >>>>>>>>>>> not end up running MR jobs, but this was probably
> > > true
> > > > > for
> > > > > > > MOB
> > > > > > > > >>> as
> > > > > > > > >>>> in
> > > > > > > > >>>>>>>>> "if
> > > > > > > > >>>>>>>>>>> you don't enable MOB you don't need MR")
> > > > > > > > >>>>>>>>>>>
> > > > > > > > >>>>>>>>>>> any thoughts? do we a rule that says "we don't
> want
> > > to
> > > > > have
> > > > > > > > >>> hbase
> > > > > > > > >>>> run
> > > > > > > > >>>>>>>>> MR
> > > > > > > > >>>>>>>>>>> jobs, only tool started manually by the user can
> do
> > > > > that".
> > > > > > or
> > > > > > > > >>> can
> > > > > > > > >>>> we
> > > > > > > > >>>>>>>>>> start
> > > > > > > > >>>>>>>>>>> adding MR calls around without problems?
> > > > > > > > >>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to