Yes, very clear. Thank You
On Thu, Feb 5, 2009 at 9:57 PM, Jonathan Gray <[email protected]> wrote: > The more map files in a region, the slower your scanning will be because > you > are actually scanning each one. > > Recent row updates will not hurt you too bad because you always have a > scanner open in Memcache (and results in memory are obviously the fastest > to > retrieve). But you'll always pay a search cost for each Mapfile that makes > up the region you're scanning. > > Each region is defined by [startKey,endKey). Each region is made up of an > in-memory map (Memcache) and 0->N HDFS files (Mapfiles). Each of these is > individually lexicographically sorted. Scanning the table involves > scanning > every file in the region. Major compactions combine all files into one. > > Is that clear? > > JG > > > -----Original Message----- > > From: Slava Gorelik [mailto:[email protected]] > > Sent: Thursday, February 05, 2009 11:33 AM > > To: [email protected] > > Subject: Re: Few questions > > > > Thank You for a quick response.So, you wrote: > > > > HBase is efficient at retrieving rows in a range between rows are > > sorted in > > lexicographical order. > > > > My question is it still efficient when rows are within the range but in > > the different map files (Like in the case of row update) ? > > And another question: map file is it lexicographically sorted ? There > > no > > sort of data between map files on the same region, is it correct ? > > > > > > Best Regards. > > Slava. > > > > > > On Thu, Feb 5, 2009 at 8:20 PM, Jonathan Gray <[email protected]> > > wrote: > > > > > Answers inline. > > > > > > > -----Original Message----- > > > > From: Slava Gorelik [mailto:[email protected]] > > > > Sent: Thursday, February 05, 2009 9:21 AM > > > > To: [email protected] > > > > Subject: Few questions > > > > > > > > Hi to All. > > > > > > > > I have a few questions to ask: > > > > > > > > 1) Is it possible to bring specific columns from the same row > > within 1 > > > > round > > > > trip (some method that takes list of column names and return > > rowresult) > > > > ? > > > > > > > > > > > http://hadoop.apache.org/hbase/docs/r0.19.0/api/org/apache/hadoop/hbase > > /clie > > > nt/HTable.html#getRow(byte[],%20byte[][]) > > > > > > HTable.getRow(byte [] row, byte [][] columns) > > > > > > Ex: byte [][] columns = {"family:column1".getBytes(), > > > "family:column2".getBytes()}; > > > > > > > > > > 2) Is key size has any implications on HBase performance? > > > > > > There are some implications but as far as I know nothing that > > significant. > > > Most users have keys on order of 10s or 100s of bytes and I've never > > seen a > > > large difference between them. Of course, the smaller the key the > > smaller > > > the payload to store and transfer. > > > > > > > > > > 3) Somewhere, i don't remember where, I read that HBase know very > > fast > > > > and > > > > efficient to retrieve rows in the range between 2 given keys, is it > > > > correct > > > > ? > > > > If yes, how it's implemented ? I suggest that data in mapfile is > > > > sorted > > > > by key (when i inserted the rows), but what happened when i updated > > > > the specific row, i guess because in > > > > HBase everything is insert , it means that updated row will be > > > > stored > > > > (probably) in different map file than original row, is it correct ? > > If > > > > yes, > > > > how can be promised efficient and fast > > > > retrieval of rows in the range between 2 keys, in this case it > > could > > > > be > > > > retrieval of rows from different map files. > > > > > > > > > HBase is efficient at retrieving rows in a range between rows are > > sorted in > > > lexicographical order. > > > > > > Check out the HBase architecture wiki page section on HRegionServer > > > (http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#hregion). > > > > > > Writes in HBase are first stored into an in-memory structure called > > > Memcache. This is periodically flushed to an HDFS Mapfile. A single > > > region > > > in HBase is made up of one Memcache and 0 to N mapfiles. > > > > > > So a scanner in HBase is really the merge of a number of scanners. > > One > > > open > > > to the Memcache (recent writes), and one open to each flushed out > > Mapfile. > > > > > > > > > Hope that helps. > > > > > > JG > > > > > > > >
