The numbers are irrelevant for this discussion as I'm trying to compare two almost equal things trying to find why there's a difference. But since you're asking nicely:
14 slave nodes, 2x E5520, 24GB of RAM (only 1GB given to HBase), 4 SATA 7200rpm disks. This is the command line I'm using: To load: hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 50 To scan: hbase org.apache.hadoop.hbase.PerformanceEvaluation scan 50 To read: hbase org.apache.hadoop.hbase.PerformanceEvaluation randomRead 50 I'm using 35 mappers per machine, for the random read test I assume the clients have to go over the network 13/14 of the time. For the scan locality should be good, but we don't have a top of the rack bottleneck. After the initial loading I major compact. For all the tests the region remain on the same machines, even across 0.90 and 0.92. BTW PerformanceEvaluation (which we call PE) using 1KB values. The size of a KV is 1.5KB on average according to the HFile tool. Hope this helps, J-D On Thu, Dec 15, 2011 at 12:17 PM, Matt Corgan <[email protected]> wrote: > 260k random reads per second is a lot... is that on one node? how many > client threads? and is the client going over the network, is it on the > datanode, or are you using a specialized test where they're in the same > process? > > > On Thu, Dec 15, 2011 at 11:35 AM, Lars <[email protected]> wrote: > >> Do you see the same slowdown with the default 64k block size? >> >> Lars <[email protected]> schrieb: >> >> >I'll be busy today... I'll double check my scanning related changes as >> soon as i can. >> > >> >Jean-Daniel Cryans <[email protected]> schrieb: >> > >> >>Yes and yes. >> >> >> >>J-D >> >>On Dec 14, 2011 5:52 PM, "Matt Corgan" <[email protected]> wrote: >> >> >> >>> Regions are major compacted and have empty memstores, so no merging of >> >>> stores when reading? >> >>> >> >>> >> >>> 2011/12/14 Jean-Daniel Cryans <[email protected]> >> >>> >> >>> > Yes sorry 1.1M >> >>> > >> >>> > This is PE, the table is set to a block size of 4KB and block caching >> >>> > is disabled. Nothing else special in there. >> >>> > >> >>> > J-D >> >>> > >> >>> > 2011/12/14 <[email protected]>: >> >>> > > Thanks for the info, J-D. >> >>> > > >> >>> > > I guess the 1.1 below is in millions. >> >>> > > >> >>> > > Can you tell us more about your tables - bloom filters, etc ? >> >>> > > >> >>> > > >> >>> > > >> >>> > > 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[email protected]> >> 写道: >> >>> > > >> >>> > >> Hey guys, >> >>> > >> >> >>> > >> I was doing some comparisons between 0.90.5 and 0.92.0, mainly >> >>> > >> regarding reads. The numbers are kinda irrelevant but the >> differences >> >>> > >> are. BTW this is on CDH3u3 with random reads. >> >>> > >> >> >>> > >> In 0.90.0, scanning 50M rows that are in the OS cache I go up to >> about >> >>> > >> 1.7M rows scanned per second. >> >>> > >> >> >>> > >> In 0.92.0, scanning those same rows (meaning that I didn't run >> >>> > >> compactions after migrating so it's picking the same data from >> the OS >> >>> > >> cache), I scan about 1.1 rows per second. >> >>> > >> >> >>> > >> 0.92 is 50% slower when scanning. >> >>> > >> >> >>> > >> In 0.90.0 random reading 50M rows that are OS cached I can do >> about >> >>> > >> 200k reads per second. >> >>> > >> >> >>> > >> In 0.92.0, again with those same rows, I can go up to 260k per >> second. >> >>> > >> >> >>> > >> 0.92 is 30% faster when random reading. >> >>> > >> >> >>> > >> I've been playing with that data set for a while and the numbers >> in >> >>> > >> 0.92.0 when using HFileV1 or V2 are pretty much the same meaning >> that >> >>> > >> something else changed or the code that's generic to both did. >> >>> > >> >> >>> > >> >> >>> > >> I'd like to be able to associate those differences to code >> changes in >> >>> > >> order to understand what's going on. I would really appreciate if >> >>> > >> others also took some time to test it out or to think about what >> could >> >>> > >> cause this. >> >>> > >> >> >>> > >> Thx, >> >>> > >> >> >>> > >> J-D >> >>> > >> >>> >>
