Re: Few questions

Slava Gorelik Thu, 05 Feb 2009 12:38:08 -0800

Yes, very clear.
Thank You


On Thu, Feb 5, 2009 at 9:57 PM, Jonathan Gray <[email protected]> wrote:

> The more map files in a region, the slower your scanning will be because
> you
> are actually scanning each one.
>
> Recent row updates will not hurt you too bad because you always have a
> scanner open in Memcache (and results in memory are obviously the fastest
> to
> retrieve).  But you'll always pay a search cost for each Mapfile that makes
> up the region you're scanning.
>
> Each region is defined by [startKey,endKey).  Each region is made up of an
> in-memory map (Memcache) and 0->N HDFS files (Mapfiles).  Each of these is
> individually lexicographically sorted.  Scanning the table involves
> scanning
> every file in the region.  Major compactions combine all files into one.
>
> Is that clear?
>
> JG
>
> > -----Original Message-----
> > From: Slava Gorelik [mailto:[email protected]]
> > Sent: Thursday, February 05, 2009 11:33 AM
> > To: [email protected]
> > Subject: Re: Few questions
> >
> > Thank You for a quick response.So, you wrote:
> >
> > HBase is efficient at retrieving rows in a range between rows are
> > sorted in
> > lexicographical order.
> >
> > My question is it still efficient when rows are within the range but in
> > the different map files (Like in the case of row update) ?
> > And another question: map file is it lexicographically sorted ? There
> > no
> > sort of data between map files on the same region, is it correct ?
> >
> >
> > Best Regards.
> > Slava.
> >
> >
> > On Thu, Feb 5, 2009 at 8:20 PM, Jonathan Gray <[email protected]>
> > wrote:
> >
> > > Answers inline.
> > >
> > > > -----Original Message-----
> > > > From: Slava Gorelik [mailto:[email protected]]
> > > > Sent: Thursday, February 05, 2009 9:21 AM
> > > > To: [email protected]
> > > > Subject: Few questions
> > > >
> > > > Hi to All.
> > > >
> > > > I have a few questions to ask:
> > > >
> > > > 1) Is it possible to bring specific columns from the same row
> > within 1
> > > > round
> > > > trip (some method that takes list of column names and return
> > rowresult)
> > > > ?
> > >
> > >
> > >
> > http://hadoop.apache.org/hbase/docs/r0.19.0/api/org/apache/hadoop/hbase
> > /clie
> > > nt/HTable.html#getRow(byte[],%20byte[][])
> > >
> > > HTable.getRow(byte [] row, byte [][] columns)
> > >
> > > Ex: byte [][] columns = {"family:column1".getBytes(),
> > > "family:column2".getBytes()};
> > >
> > >
> > > > 2) Is key size has any implications on HBase performance?
> > >
> > > There are some implications but as far as I know nothing that
> > significant.
> > > Most users have keys on order of 10s or 100s of bytes and I've never
> > seen a
> > > large difference between them.  Of course, the smaller the key the
> > smaller
> > > the payload to store and transfer.
> > >
> > >
> > > > 3) Somewhere, i don't remember where, I read that HBase know very
> > fast
> > > > and
> > > > efficient to retrieve rows in the range between 2 given keys, is it
> > > > correct
> > > > ?
> > > >    If yes, how it's implemented ? I suggest that data in mapfile is
> > > > sorted
> > > > by key (when i inserted the rows), but what happened when i updated
> > > > the specific row, i guess because in
> > > >    HBase everything is insert , it means that updated row will be
> > > > stored
> > > > (probably) in different map file than original row, is it correct ?
> > If
> > > > yes,
> > > > how can be promised efficient and fast
> > > >    retrieval of rows in the range between 2 keys, in this case it
> > could
> > > > be
> > > > retrieval of rows from different map files.
> > >
> > >
> > > HBase is efficient at retrieving rows in a range between rows are
> > sorted in
> > > lexicographical order.
> > >
> > > Check out the HBase architecture wiki page section on HRegionServer
> > > (http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#hregion).
> > >
> > > Writes in HBase are first stored into an in-memory structure called
> > > Memcache.  This is periodically flushed to an HDFS Mapfile.  A single
> > > region
> > > in HBase is made up of one Memcache and 0 to N mapfiles.
> > >
> > > So a scanner in HBase is really the merge of a number of scanners.
> > One
> > > open
> > > to the Memcache (recent writes), and one open to each flushed out
> > Mapfile.
> > >
> > >
> > > Hope that helps.
> > >
> > > JG
> > >
> > >
>
>

Re: Few questions

Reply via email to