Re: Poor HBase random read performance

Jean-Daniel Cryans Mon, 01 Jul 2013 10:13:24 -0700

You might also be interested in this benchmark I ran 3 months ago:
https://docs.google.com/spreadsheet/pub?key=0Ao87IrzZJSaydENaem5USWg4TlRKcHl0dEtTS2NBOUE&output=html


J-D

On Sat, Jun 29, 2013 at 12:13 PM, Varun Sharma <va...@pinterest.com> wrote:
> Hi,
>
> I was doing some tests on how good HBase random reads are. The setup is
> consists of a 1 node cluster with dfs replication set to 1. Short circuit
> local reads and HBase checksums are enabled. The data set is small enough
> to be largely cached in the filesystem cache - 10G on a 60G machine.
>
> Client sends out multi-get operations in batches to 10 and I try to measure
> throughput.
>
> Test #1
>
> All Data was cached in the block cache.
>
> Test Time = 120 seconds
> Num Read Ops = 12M
>
> Throughput = 100K per second
>
> Test #2
>
> I disable block cache. But now all the data is in the file system cache. I
> verify this by making sure that IOPs on the disk drive are 0 during the
> test. I run the same test with batched ops.
>
> Test Time = 120 seconds
> Num Read Ops = 0.6M
> Throughput = 5K per second
>
> Test #3
>
> I saw all the threads are now stuck in idLock.lockEntry(). So I now run
> with the lock disabled and the block cache disabled.
>
> Test Time = 120 seconds
> Num Read Ops = 1.2M
> Throughput = 10K per second
>
> Test #4
>
> I re enable block cache and this time hack hbase to only cache Index and
> Bloom blocks but data blocks come from File System cache.
>
> Test Time = 120 seconds
> Num Read Ops = 1.6M
> Throughput = 13K per second
>
> So, I wonder how come such a massive drop in throughput. I know that HDFS
> code adds tremendous overhead but this seems pretty high to me. I use
> 0.94.7 and cdh 4.2.0
>
> Thanks
> Varun

Re: Poor HBase random read performance

Reply via email to