You want: http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/HTable.html#scannerCaching
The default is low because if a job takes too long processing, a scanner can time out, which causes unhappy jobs/people/emails. BTW I can read small rows out of a 19 node cluster at 7 million rows/sec using a map-reduce program. Any individual process is doing 40k+ rows/sec or so -ryan On Tue, Dec 8, 2009 at 12:25 PM, Edward Capriolo <[email protected]> wrote: > Hey all, > > I have been doing some performance evaluation with mysql vs hbase. > > I have a table webtable > {NAME => 'webdata', FAMILIES => [{NAME => 'anchor', COMPRESSION => > 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', > IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'image', > COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE > => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => > 'raw_data', COMPRESSION => 'NONE', VERSIONS => '3', TTL => > '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE > => 'true'}]} > > I have a normalized version in mysql. I currently have loaded > > nyhadoopdev6:60030 1260289750689 requests=4, regions=3, usedHeap=99, > maxHeap=997 > nyhadoopdev7:60030 1260289862481 requests=0, regions=2, usedHeap=181, > maxHeap=997 > nyhadoopdev8:60030 1260289909059 requests=0, regions=2, usedHeap=395, > maxHeap=997 > > This is a snippet here. > > if (mysql) { > try { > PreparedStatement ps = conn.prepareStatement("SELECT * FROM > page WHERE page LIKE (?)"); > ps.setString(1,"http://www.s%"); > ResultSet rs = ps.executeQuery(); > while (rs.next() ){ > sPageCount++; > } > rs.close(); > ps.close(); > } catch (SQLException ex) {System.out.println(ex); System.exit(1); } > } > > if (hbase) { > Scan s = new Scan(); > //s.setCacheBlocks(true); > s.setFilter( new PrefixFilter(Bytes.toBytes("http://www.s") ) ); > ResultScanner scanner = table.getScanner(s); > try { > for (Result rr:scanner){ > sPageCount++; > } > } finally { > scanner.close(); > } > > } > > I am seeing about .3 MS from mysql and 20. second performance from > Hbase. I have read some tuning docs but most seem geared for insertion > speed, not search speed. I would think this would be a > Bread-and-butter search for hbase since the row keys are naturally > sorted lexicographically. I am not running a giant setup here, 3 > nodes, 2x replication, but I would think that it is almost a non > factor here since these data is fairly small. Hints ? >
