[Lucene-hadoop Wiki] Trivial Update of "Hbase/PerformanceEvaluation" by stack

Apache Wiki Thu, 20 Dec 2007 21:20:39 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by stack:
http://wiki.apache.org/lucene-hadoop/Hbase/PerformanceEvaluation

The comment on the change is:
Notes on mapfile numbers.

------------------------------------------------------------------------------
  I've also added numbers for sequential writes, random and next ('scan') reads 
into and out of a single *open* HDFS mapfile for comparison: i.e. random 
reading, we are not opening the file each time and the mapfile index is loaded 
into memory.  Going by current numbers, pure mapfile writes are slower than the 
numbers google posted in initial bigtable paper and reads just a bit faster 
(except when scanning).  GFS must be fast.
  
  ||<rowbgcolor="#ececec">Experiment 
Run||HBase20070708||HBase20070916||0.15.0||20071219||mapfile||!BigTable||
- ||random reads ||68||272||264||167||1718||1212||
+ ||random reads ||68||272||264||167||685||1212||
  ||random reads (mem)||Not implemented||Not implemented||Not implemented||Not 
Implemented||-||10811||
  ||random writes||847||1460||1277||1400||-||8850||
  ||sequential reads||301||267||305||138||-||4425||
- ||sequential writes||850||1278||1112||1691||5761||8547||
+ ||sequential writes||850||1278||1112||1691||5494||8547||
- ||scans||3063||3692||3758||3731||28886||15385||
+ ||scans||3063||3692||3758||3731||25641||15385||
  
+ Subsequently I profiled the mapfile PerformanceEvaluation.  Turns out 
generation of the values and keys to insert were taking a bunch of CPU time. 
After making a fix key and value generations were between 15-25% (the 
alternative was precompiling keys and values which would take loads of memory). 
 Rerunning tests, it looks like there can be a pretty broad range of 
fluctuation in mapfile numbers between runs.  I also noticed that the 0.15.x 
random reads seem to be 50% faster than TRUNK.  Investigate.
+

[Lucene-hadoop Wiki] Trivial Update of "Hbase/PerformanceEvaluation" by stack

Reply via email to