Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by stack: http://wiki.apache.org/lucene-hadoop/Hbase/PerformanceEvaluation The comment on the change is: Notes on mapfile numbers. ------------------------------------------------------------------------------ I've also added numbers for sequential writes, random and next ('scan') reads into and out of a single *open* HDFS mapfile for comparison: i.e. random reading, we are not opening the file each time and the mapfile index is loaded into memory. Going by current numbers, pure mapfile writes are slower than the numbers google posted in initial bigtable paper and reads just a bit faster (except when scanning). GFS must be fast. ||<rowbgcolor="#ececec">Experiment Run||HBase20070708||HBase20070916||0.15.0||20071219||mapfile||!BigTable|| - ||random reads ||68||272||264||167||1718||1212|| + ||random reads ||68||272||264||167||685||1212|| ||random reads (mem)||Not implemented||Not implemented||Not implemented||Not Implemented||-||10811|| ||random writes||847||1460||1277||1400||-||8850|| ||sequential reads||301||267||305||138||-||4425|| - ||sequential writes||850||1278||1112||1691||5761||8547|| + ||sequential writes||850||1278||1112||1691||5494||8547|| - ||scans||3063||3692||3758||3731||28886||15385|| + ||scans||3063||3692||3758||3731||25641||15385|| + Subsequently I profiled the mapfile PerformanceEvaluation. Turns out generation of the values and keys to insert were taking a bunch of CPU time. After making a fix key and value generations were between 15-25% (the alternative was precompiling keys and values which would take loads of memory). Rerunning tests, it looks like there can be a pretty broad range of fluctuation in mapfile numbers between runs. I also noticed that the 0.15.x random reads seem to be 50% faster than TRUNK. Investigate. +