On Tue, May 21, 2013 at 6:08 PM, lars hofhansl <[email protected]> wrote:
> I just did a similar test using PE on a test cluster (16 DNs/RSs, 158 
> mappers).
> I set it up such that the data does not fit into the aggregate block cache, 
> but does fit into the aggregate OS buffer cache, in my case that turned out 
> to be 100m 1k rows.
> Now I ran the SequentialRead and RandomRead tests.
>
> In both cases I see no disk activity (since the data fits into the OS cache). 
> The SequentialRead run finishes in about 7mins, whereas the RandomRead run 
> takes over 34mins.
> This is with CDH4.2.1 and HBase 0.94.7 compiled against it and with SCR 
> enabled.
>
> The only difference is that in the SequentialRead case it is likely that the 
> next Get can still use the previously cached block, whereas in the RandomRead 
> read almost every Get need to fetch a block from the OS cache (as verified by 
> the cache miss rate, which is roughly the same as the request count per 
> RegionServer). Except for enabling SCR all other settings are close to the 
> defaults.
>
> I see 2000-4000 req/s/regionserver and the same number of cache missed per 
> second and RegionServer in the RandomRead, meaning each RegionServer brought 
> in about 125-200mb/s from the OS cache, which seems a tad low.

That's a lot of variance. In my test the latencies I wrote there were
stable around those numbers. So we have a different way of measuring?

>
>
> So this would imply that reading from the OS cache is almost 5x slower than 
> reading from the block cache. It would be interesting to explore the 
> discrepancy.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Jean-Daniel Cryans <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Wednesday, April 24, 2013 6:01 PM
> Subject: Unscientific comparison of fully-cached zipfian reading
>
>
> Hey guys,
>
> I did a little benchmarking to see what kind of numbers we get from the
> block cache and the OS cache. Please see:
>
> https://docs.google.com/spreadsheet/pub?key=0Ao87IrzZJSaydENaem5USWg4TlRKcHl0dEtTS2NBOUE&output=html
>
> Hopefully it gives you some ballpark numbers for further discussion.
>
> J-D

Reply via email to