Once you are in the game of coordinating large scale tasks with
distribution, fault tolerance, etc other than implementing a similar
framework inside HBase, MR will be the way to go. Things like exporting
snapshots, dist cp, or backups (which uses these) must use such a
framework.

The issue about master launching MR jobs came in the review around that
time, and we concluded that it was fine since backups by definition require
such a framework.

Enis

On Thu, Sep 22, 2016 at 4:32 PM, Devaraj Das <d...@hortonworks.com> wrote:

> Not practical to do those tools without MR, JM. We should be using the
> right framework for the use cases in hand. MR fits this really well.
> JM, when you say "if we can do without MR, then, why not?", do you have a
> framework in mind that performs/scale as well as MR? Curious.
> ________________________________________
> From: Jean-Marc Spaggiari <jean-m...@spaggiari.org>
> Sent: Thursday, September 22, 2016 4:29 PM
> To: dev
> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>
> Well, I'm just not using those features ;) But was hopping for the MOBs ;)
> My point is, if we can do it without MR, then, why not? )
>
> 2016-09-22 19:25 GMT-04:00 Vladimir Rodionov <vladrodio...@gmail.com>:
>
> > Forgot WALPlayer :)
> >
> > -Vlad
> >
> > On Thu, Sep 22, 2016 at 4:21 PM, Vladimir Rodionov <
> vladrodio...@gmail.com
> > >
> > wrote:
> >
> > > >> and
> > > >> backups too, but don't want to bother having to install and
> configure
> > > YARN
> > > >> just for that, as well as removing resources from HBase to give it
> to
> > >
> > > Any suggestions on how to do bulk data move with transformation from/to
> > > HBase cluster w/o MapReduce?
> > >
> > > Opposition to M/R does not make sense imo, as since we have a lot of
> > tools
> > > in HBase which depend on MapReduce:
> > >
> > > CountRows
> > > CountCells
> > > Import
> > > Export
> > > ImportTsv
> > > ExportTsv
> > > CopyTable
> > > VerifyReplication
> > > ExportSnapshot
> > >
> > > and new backup create/restore of course.
> > >
> > >
> > > -Vlad
> > >
> > >
> > >
> > >
> > > On Thu, Sep 22, 2016 at 4:15 PM, Jean-Marc Spaggiari <
> > > jean-m...@spaggiari.org> wrote:
> > >
> > >> My 2¢: I have a strong preference for NOT having a dependency on MR
> > >> anywhere :( I run my HBase cluste without YARN. Just HBase and HDFS. I
> > >> like
> > >> all the features that we built. Would love to be able to use MOBs and
> > >> backups too, but don't want to bother having to install and configure
> > YARN
> > >> just for that, as well as removing resources from HBase to give it to
> > >> yarn....
> > >>
> > >> JMS
> > >>
> > >> 2016-09-22 18:44 GMT-04:00 Matteo Bertozzi <theo.berto...@gmail.com>:
> > >>
> > >> > just a remark. my query was not about tools using MR (everyone i
> think
> > >> is
> > >> > ok with those).
> > >> > the topic was about: "are we ok with running MR jobs from Master and
> > RSs
> > >> > code?" since this will be the first time we do this
> > >> >
> > >> > Matteo
> > >> >
> > >> >
> > >> > On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das <d...@hortonworks.com>
> > >> wrote:
> > >> >
> > >> > > Very much agree; for tools like ExportSnapshot / Backup / Restore,
> > >> it's
> > >> > > fine to be dependent on MR. MR is the right framework for such. We
> > >> should
> > >> > > also do compactions using MR (just saying :) )
> > >> > > ________________________________________
> > >> > > From: Ted Yu <yuzhih...@gmail.com>
> > >> > > Sent: Thursday, September 22, 2016 2:00 PM
> > >> > > To: dev@hbase.apache.org
> > >> > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> > >> > >
> > >> > > I agree - backup / restore is in the same category as import /
> > export.
> > >> > >
> > >> > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
> > >> > andrew.purt...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Backup is extra tooling around core in my opinion. Like import
> or
> > >> > export.
> > >> > > > Or the optional MOB tool. It's fine.
> > >> > > >
> > >> > > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi <
> > >> mberto...@apache.org>
> > >> > > > wrote:
> > >> > > > >
> > >> > > > > What's the latest opinion around running MR jobs from hbase
> > >> (Master
> > >> > or
> > >> > > > RS)?
> > >> > > > >
> > >> > > > > I remember in the past that there was discussion about not
> > having
> > >> MR
> > >> > > has
> > >> > > > > direct dependency of hbase.
> > >> > > > >
> > >> > > > > I think some of discussion where around MOB that had a MR job
> to
> > >> > > compact,
> > >> > > > > that later was transformed in a non-MR job to be merged, I
> think
> > >> we
> > >> > > had a
> > >> > > > > similar discussion for log split/replay.
> > >> > > > >
> > >> > > > > the latest is the new Backup feature (HBASE-7912), that runs a
> > MR
> > >> job
> > >> > > > from
> > >> > > > > the master to copy data or restore data.
> > >> > > > > (backup is also "not really core" as in.. if you don't use
> > backup
> > >> > > you'll
> > >> > > > > not end up running MR jobs, but this was probably true for MOB
> > as
> > >> in
> > >> > > "if
> > >> > > > > you don't enable MOB you don't need MR")
> > >> > > > >
> > >> > > > > any thoughts? do we a rule that says "we don't want to have
> > hbase
> > >> run
> > >> > > MR
> > >> > > > > jobs, only tool started manually by the user can do that". or
> > can
> > >> we
> > >> > > > start
> > >> > > > > adding MR calls around without problems?
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Reply via email to