PrefixFilter performance question.

Edward Capriolo Tue, 08 Dec 2009 12:25:44 -0800

Hey all,

I have been doing some performance evaluation with mysql vs hbase.


I have a table webtable
{NAME => 'webdata', FAMILIES => [{NAME => 'anchor', COMPRESSION =>
'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'image',
COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE
=> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
'raw_data', COMPRESSION => 'NONE', VERSIONS => '3', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
=> 'true'}]}

I have a normalized version in mysql. I currently have loaded

nyhadoopdev6:60030      1260289750689   requests=4, regions=3, usedHeap=99, 
maxHeap=997
nyhadoopdev7:60030      1260289862481   requests=0, regions=2, usedHeap=181,
maxHeap=997
nyhadoopdev8:60030      1260289909059   requests=0, regions=2, usedHeap=395,
maxHeap=997

This is a snippet here.

if (mysql) {
       try {
        PreparedStatement ps = conn.prepareStatement("SELECT * FROM
page WHERE page LIKE (?)");
        ps.setString(1,"http://www.s%";);
        ResultSet rs = ps.executeQuery();
        while (rs.next() ){
          sPageCount++;
        }
        rs.close();
        ps.close();
       } catch (SQLException ex) {System.out.println(ex); System.exit(1); }
      }

      if (hbase) {
        Scan s = new Scan();
        //s.setCacheBlocks(true);
        s.setFilter( new PrefixFilter(Bytes.toBytes("http://www.s";) ) );
        ResultScanner scanner = table.getScanner(s);
        try {
          for (Result rr:scanner){
            sPageCount++;
          }
       } finally {
         scanner.close();
       }

      }

I am seeing about .3 MS from mysql and 20. second performance from
Hbase. I have read some tuning docs but most seem geared for insertion
speed, not search speed. I would think this would be a
Bread-and-butter search for hbase since the row keys are naturally
sorted lexicographically. I am not running a giant setup here, 3
nodes, 2x replication, but I would think that it is almost a non
factor here since these data is fairly small. Hints ?

PrefixFilter performance question.

Reply via email to