suppose i non MR codes using java API such that it involves pprocessing of
huge data (100s of GBs) .. then is there an overhead of fetching data (such
a huge amnt) from other machines ..?

On Wed, Jul 22, 2009 at 12:24 PM, Amandeep Khurana <[email protected]> wrote:

> HBase is meant to store large tables. The intention is to store data in a
> way thats more scalable as compared to traditional database systems. Now,
> HBase is built over Hadoop and has the option of being used as the data
> store for MR jobs. However, thats not the only purpose.
>
> In all data storage systems (except embedded databases), you would have to
> fetch data to where computation has to be performed. The whole MR design
> philosophy is to take the code to the data and execute it as close to where
> the data is stored as possible.
>
>
> On Tue, Jul 21, 2009 at 11:48 PM, bharath vissapragada <
> [email protected]> wrote:
>
> > That means .. it is not very useful to write java codes (using API)  ..
> > because any way it is not using the real power of hadoop(distributed
> > processing) instead it has the overhead of fetching data from other
> > machines
> > right?
> >
> > On Wed, Jul 22, 2009 at 12:12 PM, Amandeep Khurana <[email protected]>
> > wrote:
> >
> > > Yes.. Only if you use MR. If you are writing your own code, it'll pull
> > the
> > > records to the place where you run the code.
> > >
> > > On Tue, Jul 21, 2009 at 11:39 PM, Fernando Padilla <[email protected]
> > > >wrote:
> > >
> > > > That is if you use Hadoop MapReduce right? Not if you simply access
> > HBase
> > > > through a standard api (like java)?
> > > >
> > > >
> > > >
> > > > On 7/21/09 9:49 PM, Amandeep Khurana wrote:
> > > >
> > > >> Bharath,
> > > >>
> > > >> The processing is done as local to the RS as possible. The first
> > attempt
> > > >> is
> > > >> at doing it local on the same node. If thats not possible, its done
> on
> > > the
> > > >> same rack.
> > > >>
> > > >> -ak
> > > >>
> > > >>
> > > >> On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada<
> > > >> [email protected]>  wrote:
> > > >>
> > > >>  Hi all,
> > > >>>
> > > >>> I have one simple doubt in hbase ,
> > > >>>
> > > >>> Suppose i use a scanner to iterate through all the rows in the
> hbase
> > > and
> > > >>> process the data in the table corresponding to those rows .Is the
> > > >>> processing
> > > >>> of that data done locally on the region server in which that
> > particular
> > > >>> region is located or is it transferred over network so that all the
> > > >>> processing is done on a single machine on which that script runs!!
> > > >>>
> > > >>> thanks
> > > >>>
> > > >>>
> > > >>
> > >
> >
>

Reply via email to