Just wanted to add one argument of doing this in a Master way :

Client - based backups/restore are very hard (if possible) to make fully
fault tolerant. If client fails abruptly half way, some system data will be
broken, cluster will never return into original state. We disable, for
example splits/merges, balancer during full backup and restore. Failed
client will leave cluster in that state (disabled splits/merges)

-Vlad

On Thu, Sep 22, 2016 at 9:53 PM, Vladimir Rodionov <vladrodio...@gmail.com>
wrote:

> >> If in the future, we find better ways of doing this without using MR,
> we can certainly consider that
>
> Our framework for distributed operations is abstract and allows
> different implementations. MR is just one implementation we provide.
>
> -Vlad
>
> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <d...@hortonworks.com> wrote:
>
>> Guys, first off apologies for bringing in the topic of MR-based
>> compactions.. But I was thinking more about the SpliceMachine approach of
>> managing compactions in Spark where apparently they saw a lot of benefits.
>> Apologies for giving you that sore throat Andrew; I really didn't mean to
>> :-)
>>
>> So on this issue, we have these on the plate:
>> 0. Somehow not use MR but something like that
>> 1. Run a standalone service other than master
>> 2. Shell out from the master
>>
>> I don't think we have a good answer to (0), and I don't think it's even
>> worth the effort of trying to build something when MR is already there, and
>> being used by HBase already for some operations.
>>
>> On (1), we have to deal with a myriad of issues - HA of the server not
>> being the least of them all. Security (kerberos authentication, another
>> keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead let's
>> substitute that (1) with the HBase Master. I haven't seen any good reason
>> why the HBase master shouldn't launch MR jobs if needed. It's not ideal;
>> agreed.
>>
>> Now before going to (2), let's see what are the benefits of running the
>> backup/restore jobs from the master. I think Ted has summarized some of the
>> issues that we need to take care of - basically, the master can keep track
>> of running jobs, and should it fail, the backup master can continue keeping
>> track of it (since the jobId would have been recorded in the proc WAL). The
>> master can also do cleanup, etc. of failed backup/restore processes.
>> Security is another issue - the job needs to run as 'hbase' since it owns
>> the data. Having the master launch the job makes it get that privilege. In
>> the (2) approach, it's hard to do some of the above management.
>>
>> Guys, just to reiterate, the patch as such is ready from the overall
>> design/arch point of view (maybe code review is still pending from Matteo).
>> If in the future, we find better ways of doing this without using MR, we
>> can certainly consider that. But IMO don't think we should block this patch
>> from getting merged.
>>
>> ________________________________________
>> From: 张铎 <palomino...@gmail.com>
>> Sent: Thursday, September 22, 2016 8:32 PM
>> To: dev@hbase.apache.org
>> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>>
>> So what about a standalone service other than master? You can use your own
>> procedure store in that service?
>>
>> 2016-09-23 11:28 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:
>>
>> > An earlier implementation was client driven.
>> >
>> > But with that approach, it is hard to resume if there is error midway.
>> > Using Procedure V2 makes the backup / restore more robust.
>> >
>> > Another consideration is for security. It is hard to enforce security
>> (to
>> > be implemented) for client driven actions.
>> >
>> > Cheers
>> >
>> > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell <andrew.purt...@gmail.com
>> >
>> > wrote:
>> > >
>> > > No, this misses Matteo's finer point, which is "shelling out" from the
>> > master directly to run MR is a first. Why not drive this with a utility
>> > derived from Tool?
>> > >
>> > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov <
>> vladrodio...@gmail.com>
>> > wrote:
>> > >
>> > >>>> In our production cluster,  it is a common case we just have HDFS
>> and
>> > >>>> HBase deployed.
>> > >>>> If our Master/RS depend on MR framework (especially some features
>> we
>> > >>>> have not used at all),  it introduced another cost for maintain.  I
>> > >>>> don't think it is a good idea.
>> > >>
>> > >> So , you are not backup users in this case. Many our customers have
>> full
>> > >> stack deployed and
>> > >> want see backup to be a standard feature. Besides this, nothing will
>> > happen
>> > >> in your cluster
>> > >> if you won't be doing backups.
>> > >>
>> > >> This discussion (we do not want see M/R dependency) goes to nowhere.
>> We
>> > >> asked already, at least twice, to suggest another framework (other
>> than
>> > M/R)
>> > >> for bulk data copy with *conversion*. Still waiting for suggestions.
>> > >>
>> > >> -Vlad
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu <yuzhih...@gmail.com>
>> wrote:
>> > >>>
>> > >>> If MR framework is not deployed in the cluster, hbase still
>> functions
>> > >>> normally (post merge).
>> > >>>
>> > >>> In terms of build time dependency, we have long been depending on
>> > >>> mapreduce. Take a look at ExportSnapshot.
>> > >>>
>> > >>> Cheers
>> > >>>
>> > >>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen <
>> heng.chen.1...@gmail.com>
>> > >>> wrote:
>> > >>>
>> > >>>> In our production cluster,  it is a common case we just have HDFS
>> and
>> > >>>> HBase deployed.
>> > >>>> If our Master/RS depend on MR framework (especially some features
>> we
>> > >>>> have not used at all),  it introduced another cost for maintain.  I
>> > >>>> don't think it is a good idea.
>> > >>>>
>> > >>>> 2016-09-23 10:28 GMT+08:00 张铎 <palomino...@gmail.com>:
>> > >>>>> To be specific, for example, our nice Backup/Restore feature, if
>> we
>> > >>> think
>> > >>>>> this is not a core feature of HBase, then we could make it depend
>> on
>> > >>> MR,
>> > >>>>> and start a standalone BackupManager instance that submits MR
>> jobs to
>> > >>> do
>> > >>>>> periodical maintenance job. And if we think this is a core feature
>> > that
>> > >>>>> everyone should use it, then we'd better implement it without MR
>> > >>>>> dependency, like DLS.
>> > >>>>>
>> > >>>>> Thanks.
>> > >>>>>
>> > >>>>> 2016-09-23 10:11 GMT+08:00 张铎 <palomino...@gmail.com>:
>> > >>>>>
>> > >>>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of
>> our
>> > >>>>>> features depend on MR but I think the bottom line is that we
>> should
>> > >>>> launch
>> > >>>>>> the jobs from outside manually or by other services.
>> > >>>>>>
>> > >>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell <
>> andrew.purt...@gmail.com
>> > >:
>> > >>>>>>
>> > >>>>>>> Ok, got it. Well "shelling out" is on the line I think, so a
>> fair
>> > >>>>>>> question.
>> > >>>>>>>
>> > >>>>>>> Can this be driven by a utility derived from Tool like our
>> other MR
>> > >>>> apps?
>> > >>>>>>> The issue is needing the AccessController to decide if allowed?
>> But
>> > >>>> nothing
>> > >>>>>>> prevents the user from running the job manually/independently,
>> > right?
>> > >>>>>>>
>> > >>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
>> > >>>> theo.berto...@gmail.com>
>> > >>>>>>> wrote:
>> > >>>>>>>>
>> > >>>>>>>> just a remark. my query was not about tools using MR (everyone
>> i
>> > >>>> think
>> > >>>>>>> is
>> > >>>>>>>> ok with those).
>> > >>>>>>>> the topic was about: "are we ok with running MR jobs from
>> Master
>> > >>> and
>> > >>>> RSs
>> > >>>>>>>> code?" since this will be the first time we do this
>> > >>>>>>>>
>> > >>>>>>>> Matteo
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das <
>> > >>> d...@hortonworks.com>
>> > >>>>>>> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>> Very much agree; for tools like ExportSnapshot / Backup /
>> > Restore,
>> > >>>> it's
>> > >>>>>>>>> fine to be dependent on MR. MR is the right framework for
>> such.
>> > We
>> > >>>>>>> should
>> > >>>>>>>>> also do compactions using MR (just saying :) )
>> > >>>>>>>>> ________________________________________
>> > >>>>>>>>> From: Ted Yu <yuzhih...@gmail.com>
>> > >>>>>>>>> Sent: Thursday, September 22, 2016 2:00 PM
>> > >>>>>>>>> To: dev@hbase.apache.org
>> > >>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>> > >>>>>>>>>
>> > >>>>>>>>> I agree - backup / restore is in the same category as import /
>> > >>>> export.
>> > >>>>>>>>>
>> > >>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
>> > >>>>>>> andrew.purt...@gmail.com>
>> > >>>>>>>>> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>>> Backup is extra tooling around core in my opinion. Like
>> import
>> > or
>> > >>>>>>> export.
>> > >>>>>>>>>> Or the optional MOB tool. It's fine.
>> > >>>>>>>>>>
>> > >>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi <
>> > >>>> mberto...@apache.org>
>> > >>>>>>>>>> wrote:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> What's the latest opinion around running MR jobs from hbase
>> > >>>> (Master
>> > >>>>>>> or
>> > >>>>>>>>>> RS)?
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> I remember in the past that there was discussion about not
>> > >>> having
>> > >>>> MR
>> > >>>>>>>>> has
>> > >>>>>>>>>>> direct dependency of hbase.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> I think some of discussion where around MOB that had a MR
>> job
>> > to
>> > >>>>>>>>> compact,
>> > >>>>>>>>>>> that later was transformed in a non-MR job to be merged, I
>> > think
>> > >>>> we
>> > >>>>>>>>> had a
>> > >>>>>>>>>>> similar discussion for log split/replay.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> the latest is the new Backup feature (HBASE-7912), that
>> runs a
>> > >>> MR
>> > >>>> job
>> > >>>>>>>>>> from
>> > >>>>>>>>>>> the master to copy data or restore data.
>> > >>>>>>>>>>> (backup is also "not really core" as in.. if you don't use
>> > >>> backup
>> > >>>>>>>>> you'll
>> > >>>>>>>>>>> not end up running MR jobs, but this was probably true for
>> MOB
>> > >>> as
>> > >>>> in
>> > >>>>>>>>> "if
>> > >>>>>>>>>>> you don't enable MOB you don't need MR")
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> any thoughts? do we a rule that says "we don't want to have
>> > >>> hbase
>> > >>>> run
>> > >>>>>>>>> MR
>> > >>>>>>>>>>> jobs, only tool started manually by the user can do that".
>> or
>> > >>> can
>> > >>>> we
>> > >>>>>>>>>> start
>> > >>>>>>>>>>> adding MR calls around without problems?
>> > >>>
>> >
>>
>
>

Reply via email to