I added an entry to the troubleshooting page up on the wiki:
http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A16
- Andy
________________________________
From: Ryan Rawson <[email protected]>
To: [email protected]
Sent: Tue, December 8, 2009 5:21:25 PM
Subject: Re: PrefixFilter performance question.
You want:
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/HTable.html#scannerCaching
The default is low because if a job takes too long processing, a
scanner can time out, which causes unhappy jobs/people/emails.
BTW I can read small rows out of a 19 node cluster at 7 million
rows/sec using a map-reduce program. Any individual process is doing
40k+ rows/sec or so
-ryan
On Tue, Dec 8, 2009 at 12:25 PM, Edward Capriolo <[email protected]> wrote:
> Hey all,
>
> I have been doing some performance evaluation with mysql vs hbase.
>
> I have a table webtable
> {NAME => 'webdata', FAMILIES => [{NAME => 'anchor', COMPRESSION =>
> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'image',
> COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE
> => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> 'raw_data', COMPRESSION => 'NONE', VERSIONS => '3', TTL =>
> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
> => 'true'}]}
>
> I have a normalized version in mysql. I currently have loaded
>
> nyhadoopdev6:60030 1260289750689 requests=4, regions=3, usedHeap=99,
> maxHeap=997
> nyhadoopdev7:60030 1260289862481 requests=0, regions=2, usedHeap=181,
> maxHeap=997
> nyhadoopdev8:60030 1260289909059 requests=0, regions=2, usedHeap=395,
> maxHeap=997
>
> This is a snippet here.
>
> if (mysql) {
> try {
> PreparedStatement ps = conn.prepareStatement("SELECT * FROM
> page WHERE page LIKE (?)");
> ps.setString(1,"http://www.s%");
> ResultSet rs = ps.executeQuery();
> while (rs.next() ){
> sPageCount++;
> }
> rs.close();
> ps.close();
> } catch (SQLException ex) {System.out.println(ex); System.exit(1); }
> }
>
> if (hbase) {
> Scan s = new Scan();
> //s.setCacheBlocks(true);
> s.setFilter( new PrefixFilter(Bytes.toBytes("http://www.s") ) );
> ResultScanner scanner = table.getScanner(s);
> try {
> for (Result rr:scanner){
> sPageCount++;
> }
> } finally {
> scanner.close();
> }
>
> }
>
> I am seeing about .3 MS from mysql and 20. second performance from
> Hbase. I have read some tuning docs but most seem geared for insertion
> speed, not search speed. I would think this would be a
> Bread-and-butter search for hbase since the row keys are naturally
> sorted lexicographically. I am not running a giant setup here, 3
> nodes, 2x replication, but I would think that it is almost a non
> factor here since these data is fairly small. Hints ?
>