Re: PrefixFilter performance question.

Andrew Purtell Tue, 08 Dec 2009 15:01:18 -0800

I added an entry to the troubleshooting page up on the wiki:

    http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A16


  - Andy





________________________________
From: Ryan Rawson <[email protected]>
To: [email protected]
Sent: Tue, December 8, 2009 5:21:25 PM
Subject: Re: PrefixFilter performance question.

You want:

http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/HTable.html#scannerCaching

The default is low because if a job takes too long processing, a
scanner can time out, which causes unhappy jobs/people/emails.

BTW I can read small rows out of a 19 node cluster at 7 million
rows/sec using a map-reduce program.  Any individual process is doing
40k+ rows/sec or so

-ryan

On Tue, Dec 8, 2009 at 12:25 PM, Edward Capriolo <[email protected]> wrote:
> Hey all,
>
> I have been doing some performance evaluation with mysql vs hbase.
>
> I have a table webtable
> {NAME => 'webdata', FAMILIES => [{NAME => 'anchor', COMPRESSION =>
> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'image',
> COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE
> => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> 'raw_data', COMPRESSION => 'NONE', VERSIONS => '3', TTL =>
> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
> => 'true'}]}
>
> I have a normalized version in mysql. I currently have loaded
>
> nyhadoopdev6:60030      1260289750689   requests=4, regions=3, usedHeap=99, 
> maxHeap=997
> nyhadoopdev7:60030      1260289862481   requests=0, regions=2, usedHeap=181,
> maxHeap=997
> nyhadoopdev8:60030      1260289909059   requests=0, regions=2, usedHeap=395,
> maxHeap=997
>
> This is a snippet here.
>
> if (mysql) {
>       try {
>        PreparedStatement ps = conn.prepareStatement("SELECT * FROM
> page WHERE page LIKE (?)");
>        ps.setString(1,"http://www.s%";);
>        ResultSet rs = ps.executeQuery();
>        while (rs.next() ){
>          sPageCount++;
>        }
>        rs.close();
>        ps.close();
>       } catch (SQLException ex) {System.out.println(ex); System.exit(1); }
>      }
>
>      if (hbase) {
>        Scan s = new Scan();
>        //s.setCacheBlocks(true);
>        s.setFilter( new PrefixFilter(Bytes.toBytes("http://www.s";) ) );
>        ResultScanner scanner = table.getScanner(s);
>        try {
>          for (Result rr:scanner){
>            sPageCount++;
>          }
>       } finally {
>         scanner.close();
>       }
>
>      }
>
> I am seeing about .3 MS from mysql and 20. second performance from
> Hbase. I have read some tuning docs but most seem geared for insertion
> speed, not search speed. I would think this would be a
> Bread-and-butter search for hbase since the row keys are naturally
> sorted lexicographically. I am not running a giant setup here, 3
> nodes, 2x replication, but I would think that it is almost a non
> factor here since these data is fairly small. Hints ?
>

Re: PrefixFilter performance question.

Reply via email to