[Hadoop Wiki] Trivial Update of "Hbase/PerformanceEvaluation" by stack

Apache Wiki Fri, 16 Jan 2009 21:16:30 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation

The comment on the change is:
Numbers for mapfile

------------------------------------------------------------------------------
  
  Also includes numbers for hadoop mapfile.  Table includes last test, 
0.2.0java6 (and hadoop 0.17.2) from above for easy comparison.
  
- Start cluster fresh for each test then wait for all regions to be deployed 
before starting up tests.  Speedup is combo of hdfs improvements, hbase 
improvements including batching when writing and scanning (the bigtable PE 
description alludes to scans using prefetch), and use of two JBOD'd disks -- as 
in google paper -- where previous in tests above, all disks were RAID'd. 
Otherwise, hardware is same, similar to bigtable papers's dual dual-core 
opterons, 1G for hbase, etc.
+ Start cluster fresh for each test then wait for all regions to be deployed 
before starting up tests (means no content in memcache which means that for 
such as random read we are always going to the filesystem, never getting values 
from memcache).
  
  ||<rowbgcolor="#ececec">Experiment 
Run||0.2.0java6||mapfile0.17.1||0.19.0RC1!Java6||mapfile0.19.0||!BigTable||
- ||random reads ||428||568||540||-||1212||
+ ||random reads ||428||568||540||768||1212||
  ||random reads (mem)||-||-||-||-||10811||
  ||random writes||2167||2218||9986||-||8850||
  ||sequential reads||427||582||464||-||4425||
- ||sequential writes||2076||5684||9892||-||8547||
+ ||sequential writes||2076||5684||9892||7519||8547||
  ||scans||3737||55692||20971||-||15385||
  
  Some improvement writing and scanning (faster than BigTable paper seemingly). 
 Random Reads still lag.  Sequential Reads lag badly.  A bit of fetch-ahead as 
we did scanning should help here.
  
+ Speedup is combo of hdfs improvements, hbase improvements including batching 
when writing and scanning (the bigtable PE description alludes to scans using 
prefetch), and use of two JBOD'd disks -- as in google paper -- where previous 
in tests above, all disks were RAID'd. Otherwise, hardware is same, similar to 
bigtable papers's dual dual-core opterons, 1G for hbase, etc.
+ 
+ Of note, the mapfile numbers are less than those of hbase when writing 
because the mapfile tests write one file whereas hbase after first split is 
writing to multiple files concurrently.  On the other hand, hbase random read 
is very like mapfile random read, at least in single client case; we're 
effectively asking the filesystem for a random value from the midst of a file 
in both cases.  The mapfile numbers are useful as guage of how much hdfs has 
come on since the last time we ran PE.
+ 
  Will post a new state, 8 concurrent clients, in a while so we can start 
tracking how we are doing when contending clients.

[Hadoop Wiki] Trivial Update of "Hbase/PerformanceEvaluation" by stack

Reply via email to