+1 for the simplified approach.
if most of the backup code is on client side, it may be easy to move that
to a backup module in case people ask. but for now, I'd say stick with
hbase-server if that is easier

Matteo


On Mon, Sep 26, 2016 at 9:57 PM, Devaraj Das <d...@hortonworks.com> wrote:

> Vlad, thinking about it a little more, since the master is not
> orchestrating the backup, let's make it dead simple as a first pass. I
> think we should do the following: All or most of the Backup/Restore
> operations (especially the MR job spawns) should be moved to the client.
> Ignore security for the moment - let's live with what we have as the
> current "limitation" for tools that need HDFS access - they need to run as
> hbase (or whatever the hbase daemons runs as). Consistency/cleanup needs to
> be handled as well as much as possible - if the client fails after
> initiating the backup/restore, who restores consistency in the hbase:backup
> table, or cleans up the half copied data in the hdfs dirs, etc.
> In the future, if someone needs to support self-service operations (any
> user can take a backup/restore his/her tables), we can discuss the "backup
> service" or something else.
> Folks - Stack / Andrew / Matteo / others, please speak up if you disagree
> with the above. Would like to get over this merge-to-master hump obviously.
>
> ________________________________________
> From: Vladimir Rodionov <vladrodio...@gmail.com>
> Sent: Monday, September 26, 2016 11:48 AM
> To: dev@hbase.apache.org
> Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs
> started by Master or RS)
>
> Ok, we had internal discussion and this is what we are suggesting now:
>
> 1. We will create separate module (hbase-backup) and move server-side code
> there.
> 2. Master and RS will be MR and backup free.
> 3. The code from Master will be moved into standalone service
> (BackupService) for procedure orchestration,
>      operation resume/abort and SECURITY. It means - one additional
> (process) similar to REST/Thrift server will be required
>     to operate backup.
>
> I would like to note that separate process running under hbase super user
> is required to implement security properly in a multi-tenant environment,
> otherwise, only hbase super user will be allowed to operate backups
>
> Please let us know, what do you think, HBase people :?
>
> -Vlad
>
>
>
> On Sat, Sep 24, 2016 at 2:49 PM, Stack <st...@duboce.net> wrote:
>
> > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell <
> andrew.purt...@gmail.com>
> > wrote:
> >
> > > At branch merge voting time now more eyes are getting on the design
> > issues
> > > with dissenting opinion emerging. This is the branch merge process
> > working
> > > as our community has designed it. Because this is the first full
> project
> > > review of the code and implementation I think we all have to be
> > flexible. I
> > > see the community as trying to narrow the technical objection at issue
> to
> > > the smallest possible scope. It's simple: don't call out to an external
> > > execution framework we don't own from core master (and by extension
> > > regionserver) code. We had this objection before to a proposed external
> > > compaction implementation for
> > > MOB so should not come as a surprise. Please let me know if I have
> > > misstated this.
> > >
> > >
> > The above is my understanding also.
> >
> >
> > > This would seem to require a modest refactor of coordination to move
> > > invocation of MR code out from any core code path. To restate what I
> > think
> > > is an emerging recommendation: Move cross HBase and MR coordination to
> a
> > > separate tool. This tool can ask the master to invoke procedures on the
> > > HBase side that do first mile export and last mile restore. (Internally
> > the
> > > tool can also use the procedure framework for state durability,
> perhaps,
> > > just a thought.) Then the tool can further drive the things done with
> MR
> > > like shipping data off cluster or moving remote data in place and
> > preparing
> > > it for import. These activities do not need procedure coordination and
> > > involvement of the HBase master. Only the first and last mile of the
> > > process needs atomicity within the HBase deploy. Please let me know if
> I
> > > have misstated this.
> > >
> > >
> > > Above is my understanding of our recommendation.
> >
> > St.Ack
> >
> >
> >
> > > > On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> > > >
> > > > bq. procedure gives you a retry mechanism on failure
> > > >
> > > > We do need this mechanism. Take a look at the multi-step
> > > > in FullTableBackupProcedure, etc.
> > > >
> > > > bq. let the user export it later when he wants
> > > >
> > > > This would make supporting security more complex (user A shouldn't be
> > > > exporting user B's backup). And it is not user friendly - at the time
> > > > backup request is issued, the following is specified:
> > > >
> > > > +          + " BACKUP_ROOT     The full root path to store the backup
> > > > image,\n"
> > > > +          + "                 the prefix can be hdfs, webhdfs or
> > gpfs\n"
> > > >
> > > > Backup root is an integral part of backup manifest.
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi <
> > > theo.berto...@gmail.com>
> > > > wrote:
> > > >
> > > >>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhih...@gmail.com>
> wrote:
> > > >>>
> > > >>> Ideally the export should have one job running which does the retry
> > (on
> > > >>> failed partition) itself.
> > > >>>
> > > >>
> > > >> procedure gives you a retry mechanism on failure. if you don't use
> > that,
> > > >> than you don't need procedure.
> > > >> if you want you can start a procedure executor in a non master
> process
> > > (the
> > > >> hbase-procedure is a separate package and does not depend on
> master).
> > > but
> > > >> again, export seems a case where you don't need procedure.
> > > >>
> > > >> like snapshot, the logic may just be: ask the master to take a
> backup.
> > > and
> > > >> let the user export it later when he wants. so you avoid having a MR
> > job
> > > >> started by the master since people does not seems to like it.
> > > >>
> > > >> for restore (I think that is where you use the MR splitter) you can
> > > >> probably just have a backup ready (already splitted). there is
> > already a
> > > >> jira that should do that HBASE-14135. instead of doing the operation
> > of
> > > >> split/merge on restore. you consolidate the backup "offline" (mr job
> > > >> started by the user) and then ask to restore the backup.
> > > >>
> > > >>
> > > >>>
> > > >>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi <
> > > >> theo.berto...@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>> as far as I understand the code, you don't need procedure for the
> > > >> export
> > > >>>> itself.
> > > >>>> the export operation is already idempotent, since you are just
> > copying
> > > >>>> files.
> > > >>>> if the file exist and is complete (check length, checksum, ...)
> you
> > > can
> > > >>>> skip it,
> > > >>>> otherwise you'll send it over again.
> > > >>>>
> > > >>>> you need the proc for taking the backup and restoring,
> > > >>>> because you want to complete the operation and end up with a
> > > consistent
> > > >>>> state
> > > >>>> across the multiple components you are updating (meta, fs, ...)
> > > >>>> but again, for export you can just run the tool over and over
> until
> > > the
> > > >>>> operation succeed, and that should be ok.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Matteo
> > > >>>>
> > > >>>>
> > > >>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhih...@gmail.com>
> > wrote:
> > > >>>>>
> > > >>>>> Master is involved in this discussion because currently only
> Master
> > > >>>>> instantiates ProcedureExecutor which runs the 3 Procedures for
> > > >> backup /
> > > >>>>> restore.
> > > >>>>>
> > > >>>>> What if an optional standalone service which hosts
> > ProcedureExecutor
> > > >> is
> > > >>>>> used for this purpose ?
> > > >>>>> Would that have better chance of giving us middle ground so that
> we
> > > >> can
> > > >>>>> move this forward ?
> > > >>>>>
> > > >>>>> Cheers
> > > >>>>>
> > > >>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <st...@duboce.net>
> wrote:
> > > >>>>>>
> > > >>>>>> (Moved out of the Master doing MR DISCUSSION)
> > > >>>>>>
> > > >>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov <
> > > >>>>>> vladrodio...@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>>>> -1 on that backup be in core hbase
> > > >>>>>>>
> > > >>>>>>> Not sure I understand what it means.
> > > >>>>>>>
> > > >>>>>>> Sorry for the imprecision.
> > > >>>>>>
> > > >>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a
> > dependency
> > > >>> and
> > > >>>>> so
> > > >>>>>> -1 on the Master running backup/restore MR jobs, even if
> optional.
> > > >>>>>>
> > > >>>>>> Master should not depend on MR. We've gone out of our way to
> avoid
> > > >>>> taking
> > > >>>>>> MR on as dependency in the past. Seems late in the game for us
> to
> > > >>>> change
> > > >>>>>> our opinion on this. If we didn't do it for distributed log
> > > >>> splitting,
> > > >>>> or
> > > >>>>>> MOB, why would we do it to support an optional backup/restore?
> > > >>>>>>
> > > >>>>>> I have opinions on the questions below -- i.e. that Master
> running
> > > >>>>>> backup/restore is outside of the Master's charge -- but they are
> > > >> not
> > > >>>>> worth
> > > >>>>>> much since I've not done much by way of review or contrib to
> > > >>>>> backup/restore
> > > >>>>>> other than to try it as a 'user' so I'll keep them to myself
> until
> > > >> I
> > > >>>> do.
> > > >>>>> I
> > > >>>>>> only came out from under my shell to participate on the MR as
> > > >>>> dependency
> > > >>>>>> chat.
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> M
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> 1. We are not allowed to use Master to orchestrate the whole
> > > >> process?
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> We
> > > >>>>>>> have already brought up all advantages of using
> > > >>>>>>>   Master and distributed procedures for backup and restore.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Downside of moving this to client tool is lack of fault
> > > >> tolerance:
> > > >>>>>>> 1.1 Client won't be allowed to do any operations, that can,
> > > >>>>> potentially
> > > >>>>>>> affect
> > > >>>>>>> cluster, such as disabling splits/merges, balancer.
> > > >>>>>>> 1.2 In case of client failure who will be doing the whole
> > > >> rollback
> > > >>>>>> stuff?
> > > >>>>>>> We are trying to make it atomic.
> > > >>>>>>>
> > > >>>>>>> Security is not clear.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> 2. We are not allowed to modify code of existing HBase core
> > classes
> > > >>>> (what
> > > >>>>>>> does core mean anyway)?
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> 3. We are not allowed to create backup system table
> > > >> (hbase:backup)
> > > >>>> in a
> > > >>>>>>> system space? Only in user space? The table is global.
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we
> > > >> have
> > > >>>>>> touched,
> > > >>>>>>> of course some existing HBase code.
> > > >>>>>>> 3. is not that critical, of course we can move backup system
> into
> > > >>>> user
> > > >>>>>>> space.
> > > >>>>>>>
> > > >>>>>>> And finally, will moving backup into external tool give us +1
> > > >> from
> > > >>>>> stack?
> > > >>>>>>>
> > > >>>>>>> -Vlad
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <st...@duboce.net>
> > > >> wrote:
> > > >>>>>>>
> > > >>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov <
> > > >>>>>>>> vladrodio...@gmail.com>
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>>>> + MR is dead
> > > >>>>>>>>>
> > > >>>>>>>>> Does MR know that? :)
> > > >>>>>>>>>
> > > >>>>>>>>> Again. With all due respect, stack - still no suggestions
> > > >> what
> > > >>>>> should
> > > >>>>>>> we
> > > >>>>>>>>> use for "bulk data move and transformation" instead of MR?
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Use whatever distributed engine suits your fancy -- MR, Spark,
> > > >>>>>>> distributed
> > > >>>>>>>> shell -- just don't have HBase core depend on it, even
> > > >>> optionally.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> I suggest voting first on "do we need backup in HBase"? In my
> > > >>>>>> opinion,
> > > >>>>>>>> some
> > > >>>>>>>>> group members still not sure about that and some will give -1
> > > >>>>>>>>> in any case. Just because ...
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>> We could run a vote, sure. -1 on that backup be in core hbase
> > > >> (+1
> > > >>>> on
> > > >>>>>>> adding
> > > >>>>>>>> all the API any such external tool might need to run).
> > > >>>>>>>>
> > > >>>>>>>> St.Ack
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> -Vlad
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <st...@duboce.net>
> > > >>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi <
> > > >>>>>>>>> theo.berto...@gmail.com>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> let me try to go back to my original topic.
> > > >>>>>>>>>>> this question was meant to be generic, and provide some
> > > >>> rule
> > > >>>>> for
> > > >>>>>>>> future
> > > >>>>>>>>>>> code.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> from what I can gather, a rule that may satisfy everyone
> > > >>> can
> > > >>>>> be:
> > > >>>>>>>>>>> - we don't want any core feature (e.g.
> > > >>>>> compaction/log-split/log-
> > > >>>>>>>>> reply)
> > > >>>>>>>>>>> over MR, because some cluster may not want or may have an
> > > >>>>>>>>>>> external/uncontrolled MR setup.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> +1
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>> - we allow non-core features (e.g. features enabled by a
> > > >>>> flag)
> > > >>>>>> to
> > > >>>>>>>> run
> > > >>>>>>>>> MR
> > > >>>>>>>>>>> jobs from hbase, because unless you use the feature, MR
> > > >> is
> > > >>>> not
> > > >>>>>>>>> required.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind
> > > >> a
> > > >>>> flag
> > > >>>>>> or
> > > >>>>>>>> not
> > > >>>>>>>>> --
> > > >>>>>>>>>> ever being able to launch MR jobs.
> > > >>>>>>>>>>
> > > >>>>>>>>>> + MR is dead. We should be busy working hard to undo it
> > > >> from
> > > >>>>>>>> hbase-server
> > > >>>>>>>>>> moving it out to be an optional module (Spark would be its
> > > >>>> peer).
> > > >>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy
> > > >>> are
> > > >>>>>> busy
> > > >>>>>>>>>> working hard on moving it up on to a new foundation. Lets
> > > >> not
> > > >>>>>> clutter
> > > >>>>>>>>> task
> > > >>>>>>>>>> harder by piling on more moving parts.
> > > >>>>>>>>>>
> > > >>>>>>>>>> St.Ack
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Matteo
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
> > > >>> yuzhih...@gmail.com
> > > >>>>>
> > > >>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> I suggest you look at Matteo's work for
> > > >> AssignmentManager
> > > >>>>> which
> > > >>>>>>> is
> > > >>>>>>>> to
> > > >>>>>>>>>>> make
> > > >>>>>>>>>>>> Master more stable.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Cheers
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
> > > >>> palomino...@gmail.com
> > > >>>>>
> > > >>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> No, not your fault, at lease, not this time:)
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
> > > >>>>> sequence
> > > >>>>>>> of
> > > >>>>>>>>>> calls
> > > >>>>>>>>>>>> when
> > > >>>>>>>>>>>>> starting up the HMaster? HMaster is also a
> > > >> regionserver
> > > >>>> so
> > > >>>>> it
> > > >>>>>>>>> extends
> > > >>>>>>>>>>>>> HRegionServer, and the initialization of
> > > >> HRegionServer
> > > >>>>>>> sometimes
> > > >>>>>>>>>> needs
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>>>> make rpc calls to HMaster. A simple change would
> > > >> cause
> > > >>>>>>>>> probabilistic
> > > >>>>>>>>>>> dead
> > > >>>>>>>>>>>>> lock or some strange NPEs...
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
> > > >> add
> > > >>>> new
> > > >>>>>>>> features
> > > >>>>>>>>>> or
> > > >>>>>>>>>>>> add
> > > >>>>>>>>>>>>> external dependencies to HMaster, especially add more
> > > >>>> works
> > > >>>>>> for
> > > >>>>>>>> the
> > > >>>>>>>>>>> start
> > > >>>>>>>>>>>>> up processing...
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Thanks.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu <
> > > >> yuzhih...@gmail.com
> > > >>>> :
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I read through HADOOP-13433
> > > >>>>>>>>>>>>>> <https://issues.apache.org/
> > > >> jira/browse/HADOOP-13433>
> > > >>> -
> > > >>>>> the
> > > >>>>>>>> cited
> > > >>>>>>>>>>> race
> > > >>>>>>>>>>>>>> condition is in jdk.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it
> > > >>> moving.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a
> > > >>>> problem...
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is
> > > >> it
> > > >>> in
> > > >>>>> the
> > > >>>>>>>>> backup
> > > >>>>>>>>>> /
> > > >>>>>>>>>>>>>> restore mega patch ?
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Cheers
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <
> > > >>>>>> palomino...@gmail.com>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> If you guys have already implemented the feature
> > > >> in
> > > >>>> the
> > > >>>>>> MR
> > > >>>>>>>> way
> > > >>>>>>>>>> and
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on
> > > >>> it
> > > >>>>> as I
> > > >>>>>>> do
> > > >>>>>>>>> not
> > > >>>>>>>>>>> want
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> block the development progress.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> But I strongly suggest later we need to revisit
> > > >> the
> > > >>>>>> design
> > > >>>>>>>> and
> > > >>>>>>>>>> see
> > > >>>>>>>>>>> if
> > > >>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>> can seperated the logic from HMaster as much as
> > > >>>>> possible.
> > > >>>>>>> HA
> > > >>>>>>>> is
> > > >>>>>>>>>>> not a
> > > >>>>>>>>>>>>> big
> > > >>>>>>>>>>>>>>> problem if you do not store any metada locally.
> > > >> But
> > > >>>> the
> > > >>>>>>> ugly
> > > >>>>>>>>> code
> > > >>>>>>>>>>> in
> > > >>>>>>>>>>>>>>> HMaster is readlly a problem...
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> And for security, I have a issue pending for a
> > > >> long
> > > >>>>> time.
> > > >>>>>>> Can
> > > >>>>>>>>>>> someone
> > > >>>>>>>>>>>>>> help
> > > >>>>>>>>>>>>>>> taking a simple look at it? This is what I mean,
> > > >>> ugly
> > > >>>>>>> code...
> > > >>>>>>>>>>> logout
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>> destroy the credentials in a subject when it is
> > > >>> still
> > > >>>>>> being
> > > >>>>>>>>> used,
> > > >>>>>>>>>>> and
> > > >>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the
> > > >>>>> behivor
> > > >>>>>>> and
> > > >>>>>>>>> the
> > > >>>>>>>>>>> only
> > > >>>>>>>>>>>>> way
> > > >>>>>>>>>>>>>>> to fix it is to write another piece of ugly
> > > >> code...
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> https://issues.apache.org/
> > > >> jira/browse/HADOOP-13433
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> > > >>>>>>>>>>> vladrodio...@gmail.com
> > > >>>>>>>>>>>>> :
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> If in the future, we find better ways of
> > > >> doing
> > > >>>>> this
> > > >>>>>>>>> without
> > > >>>>>>>>>>>> using
> > > >>>>>>>>>>>>>> MR,
> > > >>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>> can certainly consider that
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Our framework for distributed operations is
> > > >>>> abstract
> > > >>>>>> and
> > > >>>>>>>>> allows
> > > >>>>>>>>>>>>>>>> different implementations. MR is just one
> > > >>>>>> implementation
> > > >>>>>>> we
> > > >>>>>>>>>>>> provide.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> -Vlad
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> > > >>>>>>>>>>> d...@hortonworks.com
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the
> > > >>>> topic
> > > >>>>>> of
> > > >>>>>>>>>> MR-based
> > > >>>>>>>>>>>>>>>>> compactions.. But I was thinking more about
> > > >> the
> > > >>>>>>>>> SpliceMachine
> > > >>>>>>>>>>>>>> approach
> > > >>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>> managing compactions in Spark where
> > > >> apparently
> > > >>>> they
> > > >>>>>>> saw a
> > > >>>>>>>>> lot
> > > >>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>> benefits.
> > > >>>>>>>>>>>>>>>>> Apologies for giving you that sore throat
> > > >>>> Andrew; I
> > > >>>>>>>> really
> > > >>>>>>>>>>> didn't
> > > >>>>>>>>>>>>>> mean
> > > >>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> :-)
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> So on this issue, we have these on the plate:
> > > >>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that
> > > >>>>>>>>>>>>>>>>> 1. Run a standalone service other than master
> > > >>>>>>>>>>>>>>>>> 2. Shell out from the master
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> I don't think we have a good answer to (0),
> > > >>> and I
> > > >>>>>> don't
> > > >>>>>>>>> think
> > > >>>>>>>>>>>> it's
> > > >>>>>>>>>>>>>> even
> > > >>>>>>>>>>>>>>>>> worth the effort of trying to build something
> > > >>>> when
> > > >>>>> MR
> > > >>>>>>> is
> > > >>>>>>>>>>> already
> > > >>>>>>>>>>>>>> there,
> > > >>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>> being used by HBase already for some
> > > >>> operations.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of
> > > >>> issues -
> > > >>>>> HA
> > > >>>>>> of
> > > >>>>>>>> the
> > > >>>>>>>>>>>> server
> > > >>>>>>>>>>>>>> not
> > > >>>>>>>>>>>>>>>>> being the least of them all. Security
> > > >> (kerberos
> > > >>>>>>>>>> authentication,
> > > >>>>>>>>>>>>>> another
> > > >>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that
> > > >>>>> approach
> > > >>>>>>> is
> > > >>>>>>>>> DOA.
> > > >>>>>>>>>>>>> Instead
> > > >>>>>>>>>>>>>>>> let's
> > > >>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I
> > > >>>>> haven't
> > > >>>>>>> seen
> > > >>>>>>>>> any
> > > >>>>>>>>>>>> good
> > > >>>>>>>>>>>>>>> reason
> > > >>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs
> > > >>> if
> > > >>>>>>> needed.
> > > >>>>>>>>> It's
> > > >>>>>>>>>>> not
> > > >>>>>>>>>>>>>>> ideal;
> > > >>>>>>>>>>>>>>>>> agreed.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Now before going to (2), let's see what are
> > > >> the
> > > >>>>>>> benefits
> > > >>>>>>>> of
> > > >>>>>>>>>>>> running
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think
> > > >>> Ted
> > > >>>>> has
> > > >>>>>>>>>> summarized
> > > >>>>>>>>>>>>> some
> > > >>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>> issues that we need to take care of -
> > > >>> basically,
> > > >>>>> the
> > > >>>>>>>> master
> > > >>>>>>>>>> can
> > > >>>>>>>>>>>>> keep
> > > >>>>>>>>>>>>>>>> track
> > > >>>>>>>>>>>>>>>>> of running jobs, and should it fail, the
> > > >> backup
> > > >>>>>> master
> > > >>>>>>>> can
> > > >>>>>>>>>>>> continue
> > > >>>>>>>>>>>>>>>> keeping
> > > >>>>>>>>>>>>>>>>> track of it (since the jobId would have been
> > > >>>>> recorded
> > > >>>>>>> in
> > > >>>>>>>>> the
> > > >>>>>>>>>>> proc
> > > >>>>>>>>>>>>>> WAL).
> > > >>>>>>>>>>>>>>>> The
> > > >>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed
> > > >>>>>>> backup/restore
> > > >>>>>>>>>>>>> processes.
> > > >>>>>>>>>>>>>>>>> Security is another issue - the job needs to
> > > >>> run
> > > >>>> as
> > > >>>>>>>> 'hbase'
> > > >>>>>>>>>>> since
> > > >>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>> owns
> > > >>>>>>>>>>>>>>>>> the data. Having the master launch the job
> > > >>> makes
> > > >>>> it
> > > >>>>>> get
> > > >>>>>>>>> that
> > > >>>>>>>>>>>>>> privilege.
> > > >>>>>>>>>>>>>>>> In
> > > >>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the
> > > >>>> above
> > > >>>>>>>>>> management.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is
> > > >>>> ready
> > > >>>>>>> from
> > > >>>>>>>>> the
> > > >>>>>>>>>>>>> overall
> > > >>>>>>>>>>>>>>>>> design/arch point of view (maybe code review
> > > >> is
> > > >>>>> still
> > > >>>>>>>>> pending
> > > >>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>> Matteo).
> > > >>>>>>>>>>>>>>>>> If in the future, we find better ways of
> > > >> doing
> > > >>>> this
> > > >>>>>>>> without
> > > >>>>>>>>>>> using
> > > >>>>>>>>>>>>> MR,
> > > >>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't
> > > >>> think
> > > >>>> we
> > > >>>>>>>> should
> > > >>>>>>>>>>> block
> > > >>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>> patch
> > > >>>>>>>>>>>>>>>>> from getting merged.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> ________________________________________
> > > >>>>>>>>>>>>>>>>> From: 张铎 <palomino...@gmail.com>
> > > >>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM
> > > >>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
> > > >>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by
> > > >>>> Master
> > > >>>>>> or
> > > >>>>>>> RS
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> So what about a standalone service other than
> > > >>>>> master?
> > > >>>>>>> You
> > > >>>>>>>>> can
> > > >>>>>>>>>>> use
> > > >>>>>>>>>>>>>> your
> > > >>>>>>>>>>>>>>>> own
> > > >>>>>>>>>>>>>>>>> procedure store in that service?
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu <
> > > >>>>>> yuzhih...@gmail.com
> > > >>>>>>>> :
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> An earlier implementation was client
> > > >> driven.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> But with that approach, it is hard to
> > > >> resume
> > > >>> if
> > > >>>>>> there
> > > >>>>>>>> is
> > > >>>>>>>>>>> error
> > > >>>>>>>>>>>>>>> midway.
> > > >>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup /
> > > >> restore
> > > >>>>> more
> > > >>>>>>>>> robust.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Another consideration is for security. It
> > > >> is
> > > >>>> hard
> > > >>>>>> to
> > > >>>>>>>>>> enforce
> > > >>>>>>>>>>>>>> security
> > > >>>>>>>>>>>>>>>> (to
> > > >>>>>>>>>>>>>>>>>> be implemented) for client driven actions.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Cheers
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew
> > > >>> Purtell <
> > > >>>>>>>>>>>>>>>> andrew.purt...@gmail.com>
> > > >>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point,
> > > >> which
> > > >>>> is
> > > >>>>>>>>> "shelling
> > > >>>>>>>>>>> out"
> > > >>>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why
> > > >> not
> > > >>>>> drive
> > > >>>>>>>> this
> > > >>>>>>>>>>> with a
> > > >>>>>>>>>>>>>>> utility
> > > >>>>>>>>>>>>>>>>>> derived from Tool?
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir
> > > >>>> Rodionov
> > > >>>>> <
> > > >>>>>>>>>>>>>>>> vladrodio...@gmail.com
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
> > > >>> common
> > > >>>>>> case
> > > >>>>>>> we
> > > >>>>>>>>>> just
> > > >>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>> HDFS
> > > >>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>>> HBase deployed.
> > > >>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
> > > >> framework
> > > >>>>>>>> (especially
> > > >>>>>>>>>> some
> > > >>>>>>>>>>>>>>> features
> > > >>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
> > > >>>>> another
> > > >>>>>>> cost
> > > >>>>>>>>> for
> > > >>>>>>>>>>>>>> maintain.
> > > >>>>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> So , you are not backup users in this
> > > >>> case.
> > > >>>>> Many
> > > >>>>>>> our
> > > >>>>>>>>>>>> customers
> > > >>>>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>>>> full
> > > >>>>>>>>>>>>>>>>>>>> stack deployed and
> > > >>>>>>>>>>>>>>>>>>>> want see backup to be a standard
> > > >> feature.
> > > >>>>>> Besides
> > > >>>>>>>>> this,
> > > >>>>>>>>>>>>> nothing
> > > >>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>> happen
> > > >>>>>>>>>>>>>>>>>>>> in your cluster
> > > >>>>>>>>>>>>>>>>>>>> if you won't be doing backups.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R
> > > >>>>>>> dependency)
> > > >>>>>>>>> goes
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> nowhere.
> > > >>>>>>>>>>>>>>>>> We
> > > >>>>>>>>>>>>>>>>>>>> asked already, at least twice, to
> > > >> suggest
> > > >>>>>> another
> > > >>>>>>>>>>> framework
> > > >>>>>>>>>>>>>> (other
> > > >>>>>>>>>>>>>>>>> than
> > > >>>>>>>>>>>>>>>>>> M/R)
> > > >>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*.
> > > >>> Still
> > > >>>>>>> waiting
> > > >>>>>>>>> for
> > > >>>>>>>>>>>>>>> suggestions.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> -Vlad
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted
> > > >> Yu <
> > > >>>>>>>>>>>> yuzhih...@gmail.com
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the
> > > >>>>> cluster,
> > > >>>>>>>> hbase
> > > >>>>>>>>>>> still
> > > >>>>>>>>>>>>>>>> functions
> > > >>>>>>>>>>>>>>>>>>>>> normally (post merge).
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we
> > > >>> have
> > > >>>>> long
> > > >>>>>>>> been
> > > >>>>>>>>>>>>> depending
> > > >>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at
> > > >> ExportSnapshot.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Cheers
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng
> > > >>> Chen
> > > >>>> <
> > > >>>>>>>>>>>>>>>> heng.chen.1...@gmail.com
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
> > > >>> common
> > > >>>>>> case
> > > >>>>>>> we
> > > >>>>>>>>>> just
> > > >>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>> HDFS
> > > >>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>>> HBase deployed.
> > > >>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
> > > >> framework
> > > >>>>>>>> (especially
> > > >>>>>>>>>> some
> > > >>>>>>>>>>>>>>> features
> > > >>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
> > > >>>>> another
> > > >>>>>>> cost
> > > >>>>>>>>> for
> > > >>>>>>>>>>>>>> maintain.
> > > >>>>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 <
> > > >>>>>>>>> palomino...@gmail.com
> > > >>>>>>>>>>> :
> > > >>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice
> > > >>>>>>>>> Backup/Restore
> > > >>>>>>>>>>>>> feature,
> > > >>>>>>>>>>>>>>> if
> > > >>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase,
> > > >>> then
> > > >>>>> we
> > > >>>>>>>> could
> > > >>>>>>>>>> make
> > > >>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>> depend
> > > >>>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>>>> MR,
> > > >>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager
> > > >>>>> instance
> > > >>>>>>>> that
> > > >>>>>>>>>>>> submits
> > > >>>>>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>> jobs
> > > >>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>> do
> > > >>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we
> > > >>>> think
> > > >>>>>>> this
> > > >>>>>>>>> is a
> > > >>>>>>>>>>>> core
> > > >>>>>>>>>>>>>>>> feature
> > > >>>>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd
> > > >>> better
> > > >>>>>>>> implement
> > > >>>>>>>>> it
> > > >>>>>>>>>>>>> without
> > > >>>>>>>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Thanks.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 <
> > > >>>>>>>>> palomino...@gmail.com
> > > >>>>>>>>>>> :
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR
> > > >>>> jobs.
> > > >>>>>> It
> > > >>>>>>> is
> > > >>>>>>>>> OK
> > > >>>>>>>>>>> that
> > > >>>>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>> our
> > > >>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think
> > > >> the
> > > >>>>> bottom
> > > >>>>>>>> line
> > > >>>>>>>>> is
> > > >>>>>>>>>>>> that
> > > >>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>> should
> > > >>>>>>>>>>>>>>>>>>>>>> launch
> > > >>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by
> > > >>>> other
> > > >>>>>>>>> services.
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew
> > > >>>> Purtell <
> > > >>>>>>>>>>>>>>>>> andrew.purt...@gmail.com
> > > >>>>>>>>>>>>>>>>>>> :
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is
> > > >> on
> > > >>>> the
> > > >>>>>>> line
> > > >>>>>>>> I
> > > >>>>>>>>>>> think,
> > > >>>>>>>>>>>>> so
> > > >>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>> fair
> > > >>>>>>>>>>>>>>>>>>>>>>>>> question.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility
> > > >>> derived
> > > >>>>>> from
> > > >>>>>>>> Tool
> > > >>>>>>>>>>> like
> > > >>>>>>>>>>>>> our
> > > >>>>>>>>>>>>>>>> other
> > > >>>>>>>>>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>>>>>>>> apps?
> > > >>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the
> > > >>>> AccessController
> > > >>>>>> to
> > > >>>>>>>>> decide
> > > >>>>>>>>>>> if
> > > >>>>>>>>>>>>>>> allowed?
> > > >>>>>>>>>>>>>>>>> But
> > > >>>>>>>>>>>>>>>>>>>>>> nothing
> > > >>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the
> > > >>> job
> > > >>>>>>>>>>>>>>> manually/independently,
> > > >>>>>>>>>>>>>>>>>> right?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM,
> > > >> Matteo
> > > >>>>>>> Bertozzi <
> > > >>>>>>>>>>>>>>>>>>>>>> theo.berto...@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not
> > > >>> about
> > > >>>>>> tools
> > > >>>>>>>>> using
> > > >>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>> (everyone i
> > > >>>>>>>>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> ok with those).
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok
> > > >> with
> > > >>>>>> running
> > > >>>>>>>> MR
> > > >>>>>>>>>> jobs
> > > >>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>> Master
> > > >>>>>>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>>> RSs
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the
> > > >> first
> > > >>>> time
> > > >>>>>> we
> > > >>>>>>> do
> > > >>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Matteo
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM,
> > > >>>>> Devaraj
> > > >>>>>>> Das
> > > >>>>>>>> <
> > > >>>>>>>>>>>>>>>>>>>>> d...@hortonworks.com>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like
> > > >>>>>>>> ExportSnapshot
> > > >>>>>>>>> /
> > > >>>>>>>>>>>>> Backup /
> > > >>>>>>>>>>>>>>>>>> Restore,
> > > >>>>>>>>>>>>>>>>>>>>>> it's
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is
> > > >>> the
> > > >>>>>> right
> > > >>>>>>>>>>> framework
> > > >>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>> such.
> > > >>>>>>>>>>>>>>>>>> We
> > > >>>>>>>>>>>>>>>>>>>>>>>>> should
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR
> > > >> (just
> > > >>>>> saying
> > > >>>>>>> :)
> > > >>>>>>>> )
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________
> > > >>>>> __________
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu <
> > > >> yuzhih...@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22,
> > > >> 2016
> > > >>>> 2:00
> > > >>>>>> PM
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs
> > > >>>>> started
> > > >>>>>>> by
> > > >>>>>>>>>> Master
> > > >>>>>>>>>>>> or
> > > >>>>>>>>>>>>> RS
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in
> > > >>> the
> > > >>>>> same
> > > >>>>>>>>>> category
> > > >>>>>>>>>>> as
> > > >>>>>>>>>>>>>>> import
> > > >>>>>>>>>>>>>>>> /
> > > >>>>>>>>>>>>>>>>>>>>>> export.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM,
> > > >>>> Andrew
> > > >>>>>>>>> Purtell <
> > > >>>>>>>>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around
> > > >>> core
> > > >>>> in
> > > >>>>>> my
> > > >>>>>>>>>> opinion.
> > > >>>>>>>>>>>>> Like
> > > >>>>>>>>>>>>>>>> import
> > > >>>>>>>>>>>>>>>>>> or
> > > >>>>>>>>>>>>>>>>>>>>>>>>> export.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's
> > > >>> fine.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM,
> > > >>> Matteo
> > > >>>>>>>> Bertozzi
> > > >>>>>>>>> <
> > > >>>>>>>>>>>>>>>>>>>>>> mberto...@apache.org>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion
> > > >> around
> > > >>>>>> running
> > > >>>>>>> MR
> > > >>>>>>>>>> jobs
> > > >>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>> hbase
> > > >>>>>>>>>>>>>>>>>>>>>> (Master
> > > >>>>>>>>>>>>>>>>>>>>>>>>> or
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that
> > > >> there
> > > >>>> was
> > > >>>>>>>>>> discussion
> > > >>>>>>>>>>>>> about
> > > >>>>>>>>>>>>>>> not
> > > >>>>>>>>>>>>>>>>>>>>> having
> > > >>>>>>>>>>>>>>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> has
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion
> > > >> where
> > > >>>>> around
> > > >>>>>>> MOB
> > > >>>>>>>>>> that
> > > >>>>>>>>>>>> had
> > > >>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>> job
> > > >>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> compact,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a
> > > >>>>> non-MR
> > > >>>>>>> job
> > > >>>>>>>> to
> > > >>>>>>>>>> be
> > > >>>>>>>>>>>>>> merged,
> > > >>>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> had a
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log
> > > >>>>>> split/replay.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup
> > > >>>> feature
> > > >>>>>>>>>>> (HBASE-7912),
> > > >>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>> runs
> > > >>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>>>>>>>> job
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or
> > > >>> restore
> > > >>>>>> data.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really
> > > >> core"
> > > >>>> as
> > > >>>>>> in..
> > > >>>>>>>> if
> > > >>>>>>>>>> you
> > > >>>>>>>>>>>>> don't
> > > >>>>>>>>>>>>>>> use
> > > >>>>>>>>>>>>>>>>>>>>> backup
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> you'll
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but
> > > >>>> this
> > > >>>>>> was
> > > >>>>>>>>>> probably
> > > >>>>>>>>>>>>> true
> > > >>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>> MOB
> > > >>>>>>>>>>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> "if
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't
> > > >>> need
> > > >>>>>> MR")
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that
> > > >>>> says
> > > >>>>>> "we
> > > >>>>>>>>> don't
> > > >>>>>>>>>>> want
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>>>>>>>> hbase
> > > >>>>>>>>>>>>>>>>>>>>>> run
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started
> > > >> manually
> > > >>> by
> > > >>>>> the
> > > >>>>>>>> user
> > > >>>>>>>>>> can
> > > >>>>>>>>>>> do
> > > >>>>>>>>>>>>>>> that".
> > > >>>>>>>>>>>>>>>> or
> > > >>>>>>>>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> start
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without
> > > >>>>>> problems?
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
>

Reply via email to