On Wed, Jul 22, 2009 at 12:07 AM, bharath vissapragada < [email protected]> wrote:
> That means we have to stick to the principle of MR whenever we require > efficient data processing .. > but map reduce cannot offer solutions to gnrl database problems i guess! > I'd recommend you read up the papers on MR, BigTable, and some of the latest stuff on HadoopDB etc. That'll give you clarity. > > On Wed, Jul 22, 2009 at 12:34 PM, Amandeep Khurana <[email protected]> > wrote: > > > On Wed, Jul 22, 2009 at 12:01 AM, bharath vissapragada < > > [email protected]> wrote: > > > > > suppose i non MR codes using java API such that it involves pprocessing > > of > > > huge data (100s of GBs) .. then is there an overhead of fetching data > > (such > > > a huge amnt) from other machines ..? > > > > > > Ofcourse. Network and I/O overheads definitely plague processing large > > datasets. > > > > > > > > > > > > > On Wed, Jul 22, 2009 at 12:24 PM, Amandeep Khurana <[email protected]> > > > wrote: > > > > > > > HBase is meant to store large tables. The intention is to store data > in > > a > > > > way thats more scalable as compared to traditional database systems. > > Now, > > > > HBase is built over Hadoop and has the option of being used as the > data > > > > store for MR jobs. However, thats not the only purpose. > > > > > > > > In all data storage systems (except embedded databases), you would > have > > > to > > > > fetch data to where computation has to be performed. The whole MR > > design > > > > philosophy is to take the code to the data and execute it as close to > > > where > > > > the data is stored as possible. > > > > > > > > > > > > On Tue, Jul 21, 2009 at 11:48 PM, bharath vissapragada < > > > > [email protected]> wrote: > > > > > > > > > That means .. it is not very useful to write java codes (using API) > > .. > > > > > because any way it is not using the real power of > hadoop(distributed > > > > > processing) instead it has the overhead of fetching data from other > > > > > machines > > > > > right? > > > > > > > > > > On Wed, Jul 22, 2009 at 12:12 PM, Amandeep Khurana < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > Yes.. Only if you use MR. If you are writing your own code, it'll > > > pull > > > > > the > > > > > > records to the place where you run the code. > > > > > > > > > > > > On Tue, Jul 21, 2009 at 11:39 PM, Fernando Padilla < > > > [email protected] > > > > > > >wrote: > > > > > > > > > > > > > That is if you use Hadoop MapReduce right? Not if you simply > > access > > > > > HBase > > > > > > > through a standard api (like java)? > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 7/21/09 9:49 PM, Amandeep Khurana wrote: > > > > > > > > > > > > > >> Bharath, > > > > > > >> > > > > > > >> The processing is done as local to the RS as possible. The > first > > > > > attempt > > > > > > >> is > > > > > > >> at doing it local on the same node. If thats not possible, its > > > done > > > > on > > > > > > the > > > > > > >> same rack. > > > > > > >> > > > > > > >> -ak > > > > > > >> > > > > > > >> > > > > > > >> On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada< > > > > > > >> [email protected]> wrote: > > > > > > >> > > > > > > >> Hi all, > > > > > > >>> > > > > > > >>> I have one simple doubt in hbase , > > > > > > >>> > > > > > > >>> Suppose i use a scanner to iterate through all the rows in > the > > > > hbase > > > > > > and > > > > > > >>> process the data in the table corresponding to those rows .Is > > the > > > > > > >>> processing > > > > > > >>> of that data done locally on the region server in which that > > > > > particular > > > > > > >>> region is located or is it transferred over network so that > all > > > the > > > > > > >>> processing is done on a single machine on which that script > > > runs!! > > > > > > >>> > > > > > > >>> thanks > > > > > > >>> > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > >
