Answers inline. > -----Original Message----- > From: Slava Gorelik [mailto:[email protected]] > Sent: Thursday, February 05, 2009 9:21 AM > To: [email protected] > Subject: Few questions > > Hi to All. > > I have a few questions to ask: > > 1) Is it possible to bring specific columns from the same row within 1 > round > trip (some method that takes list of column names and return rowresult) > ?
http://hadoop.apache.org/hbase/docs/r0.19.0/api/org/apache/hadoop/hbase/clie nt/HTable.html#getRow(byte[],%20byte[][]) HTable.getRow(byte [] row, byte [][] columns) Ex: byte [][] columns = {"family:column1".getBytes(), "family:column2".getBytes()}; > 2) Is key size has any implications on HBase performance? There are some implications but as far as I know nothing that significant. Most users have keys on order of 10s or 100s of bytes and I've never seen a large difference between them. Of course, the smaller the key the smaller the payload to store and transfer. > 3) Somewhere, i don't remember where, I read that HBase know very fast > and > efficient to retrieve rows in the range between 2 given keys, is it > correct > ? > If yes, how it's implemented ? I suggest that data in mapfile is > sorted > by key (when i inserted the rows), but what happened when i updated > the specific row, i guess because in > HBase everything is insert , it means that updated row will be > stored > (probably) in different map file than original row, is it correct ? If > yes, > how can be promised efficient and fast > retrieval of rows in the range between 2 keys, in this case it could > be > retrieval of rows from different map files. HBase is efficient at retrieving rows in a range between rows are sorted in lexicographical order. Check out the HBase architecture wiki page section on HRegionServer (http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#hregion). Writes in HBase are first stored into an in-memory structure called Memcache. This is periodically flushed to an HDFS Mapfile. A single region in HBase is made up of one Memcache and 0 to N mapfiles. So a scanner in HBase is really the merge of a number of scanners. One open to the Memcache (recent writes), and one open to each flushed out Mapfile. Hope that helps. JG
