RE: Few questions

Jonathan Gray Thu, 05 Feb 2009 10:19:12 -0800

Answers inline.

> -----Original Message-----
> From: Slava Gorelik [mailto:[email protected]]
> Sent: Thursday, February 05, 2009 9:21 AM
> To: [email protected]
> Subject: Few questions
> 
> Hi to All.
> 
> I have a few questions to ask:
> 
> 1) Is it possible to bring specific columns from the same row within 1
> round
> trip (some method that takes list of column names and return rowresult)
> ?


http://hadoop.apache.org/hbase/docs/r0.19.0/api/org/apache/hadoop/hbase/clie
nt/HTable.html#getRow(byte[],%20byte[][])

HTable.getRow(byte [] row, byte [][] columns)

Ex: byte [][] columns = {"family:column1".getBytes(),
"family:column2".getBytes()};


> 2) Is key size has any implications on HBase performance?

There are some implications but as far as I know nothing that significant.
Most users have keys on order of 10s or 100s of bytes and I've never seen a
large difference between them.  Of course, the smaller the key the smaller
the payload to store and transfer.


> 3) Somewhere, i don't remember where, I read that HBase know very fast
> and
> efficient to retrieve rows in the range between 2 given keys, is it
> correct
> ?
>    If yes, how it's implemented ? I suggest that data in mapfile is
> sorted
> by key (when i inserted the rows), but what happened when i updated
> the specific row, i guess because in
>    HBase everything is insert , it means that updated row will be
> stored
> (probably) in different map file than original row, is it correct ? If
> yes,
> how can be promised efficient and fast
>    retrieval of rows in the range between 2 keys, in this case it could
> be
> retrieval of rows from different map files.
 

HBase is efficient at retrieving rows in a range between rows are sorted in
lexicographical order.

Check out the HBase architecture wiki page section on HRegionServer
(http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#hregion).

Writes in HBase are first stored into an in-memory structure called
Memcache.  This is periodically flushed to an HDFS Mapfile.  A single region
in HBase is made up of one Memcache and 0 to N mapfiles.

So a scanner in HBase is really the merge of a number of scanners.  One open
to the Memcache (recent writes), and one open to each flushed out Mapfile.


Hope that helps.

JG

RE: Few questions

Reply via email to