Once you are in the game of coordinating large scale tasks with distribution, fault tolerance, etc other than implementing a similar framework inside HBase, MR will be the way to go. Things like exporting snapshots, dist cp, or backups (which uses these) must use such a framework.
The issue about master launching MR jobs came in the review around that time, and we concluded that it was fine since backups by definition require such a framework. Enis On Thu, Sep 22, 2016 at 4:32 PM, Devaraj Das <d...@hortonworks.com> wrote: > Not practical to do those tools without MR, JM. We should be using the > right framework for the use cases in hand. MR fits this really well. > JM, when you say "if we can do without MR, then, why not?", do you have a > framework in mind that performs/scale as well as MR? Curious. > ________________________________________ > From: Jean-Marc Spaggiari <jean-m...@spaggiari.org> > Sent: Thursday, September 22, 2016 4:29 PM > To: dev > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > Well, I'm just not using those features ;) But was hopping for the MOBs ;) > My point is, if we can do it without MR, then, why not? ) > > 2016-09-22 19:25 GMT-04:00 Vladimir Rodionov <vladrodio...@gmail.com>: > > > Forgot WALPlayer :) > > > > -Vlad > > > > On Thu, Sep 22, 2016 at 4:21 PM, Vladimir Rodionov < > vladrodio...@gmail.com > > > > > wrote: > > > > > >> and > > > >> backups too, but don't want to bother having to install and > configure > > > YARN > > > >> just for that, as well as removing resources from HBase to give it > to > > > > > > Any suggestions on how to do bulk data move with transformation from/to > > > HBase cluster w/o MapReduce? > > > > > > Opposition to M/R does not make sense imo, as since we have a lot of > > tools > > > in HBase which depend on MapReduce: > > > > > > CountRows > > > CountCells > > > Import > > > Export > > > ImportTsv > > > ExportTsv > > > CopyTable > > > VerifyReplication > > > ExportSnapshot > > > > > > and new backup create/restore of course. > > > > > > > > > -Vlad > > > > > > > > > > > > > > > On Thu, Sep 22, 2016 at 4:15 PM, Jean-Marc Spaggiari < > > > jean-m...@spaggiari.org> wrote: > > > > > >> My 2¢: I have a strong preference for NOT having a dependency on MR > > >> anywhere :( I run my HBase cluste without YARN. Just HBase and HDFS. I > > >> like > > >> all the features that we built. Would love to be able to use MOBs and > > >> backups too, but don't want to bother having to install and configure > > YARN > > >> just for that, as well as removing resources from HBase to give it to > > >> yarn.... > > >> > > >> JMS > > >> > > >> 2016-09-22 18:44 GMT-04:00 Matteo Bertozzi <theo.berto...@gmail.com>: > > >> > > >> > just a remark. my query was not about tools using MR (everyone i > think > > >> is > > >> > ok with those). > > >> > the topic was about: "are we ok with running MR jobs from Master and > > RSs > > >> > code?" since this will be the first time we do this > > >> > > > >> > Matteo > > >> > > > >> > > > >> > On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das <d...@hortonworks.com> > > >> wrote: > > >> > > > >> > > Very much agree; for tools like ExportSnapshot / Backup / Restore, > > >> it's > > >> > > fine to be dependent on MR. MR is the right framework for such. We > > >> should > > >> > > also do compactions using MR (just saying :) ) > > >> > > ________________________________________ > > >> > > From: Ted Yu <yuzhih...@gmail.com> > > >> > > Sent: Thursday, September 22, 2016 2:00 PM > > >> > > To: dev@hbase.apache.org > > >> > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > >> > > > > >> > > I agree - backup / restore is in the same category as import / > > export. > > >> > > > > >> > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < > > >> > andrew.purt...@gmail.com> > > >> > > wrote: > > >> > > > > >> > > > Backup is extra tooling around core in my opinion. Like import > or > > >> > export. > > >> > > > Or the optional MOB tool. It's fine. > > >> > > > > > >> > > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi < > > >> mberto...@apache.org> > > >> > > > wrote: > > >> > > > > > > >> > > > > What's the latest opinion around running MR jobs from hbase > > >> (Master > > >> > or > > >> > > > RS)? > > >> > > > > > > >> > > > > I remember in the past that there was discussion about not > > having > > >> MR > > >> > > has > > >> > > > > direct dependency of hbase. > > >> > > > > > > >> > > > > I think some of discussion where around MOB that had a MR job > to > > >> > > compact, > > >> > > > > that later was transformed in a non-MR job to be merged, I > think > > >> we > > >> > > had a > > >> > > > > similar discussion for log split/replay. > > >> > > > > > > >> > > > > the latest is the new Backup feature (HBASE-7912), that runs a > > MR > > >> job > > >> > > > from > > >> > > > > the master to copy data or restore data. > > >> > > > > (backup is also "not really core" as in.. if you don't use > > backup > > >> > > you'll > > >> > > > > not end up running MR jobs, but this was probably true for MOB > > as > > >> in > > >> > > "if > > >> > > > > you don't enable MOB you don't need MR") > > >> > > > > > > >> > > > > any thoughts? do we a rule that says "we don't want to have > > hbase > > >> run > > >> > > MR > > >> > > > > jobs, only tool started manually by the user can do that". or > > can > > >> we > > >> > > > start > > >> > > > > adding MR calls around without problems? > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > >