Hi,

My setup is as follows:
24 regionservers (7GB RAM, 8-core CPU, 5GB heap space)
hbase 0.94.4
5-7 regions per regionserver

I am doing an avg of 4k-5k random gets per regionserver per second and the
performance is acceptable in the beginning. I have also done ~10K gets for
a single regionserver and got the results back in 600-800ms. After a while
the performance of the GETs starts degrading. The same ~10K random gets
start taking upwards of 9s-10s.

With regards to hbase settings that I have modified, I have disabled major
compaction, increase region size to 100G and bumped up the handler count to
100.

I monitored ganglia for metrics that vary when the performance shifts from
good to bad and found that the fsPreadLatency_avg_time is almost 25x in the
bad performing regionserver. fsReadLatency_avg_time is also slightly higher
but not that much (it's around 2x).

I took a thread dump of the regionserver process and also did CPU
utilization monitoring. The CPU cycles were being spent
on org.apache.hadoop.hdfs.BlockReaderLocal.read and stack trace for threads
running that function is below this email.

Any pointers on why positional reads degrade over time ? Or is this just an
issue of disk I/O and I should start looking into that ?

Thanks,
Viral

====stacktrace for one of the handler doing blockread====
"IPC Server handler 98 on 60020" - Thread t@147
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:220)
at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:324)
- locked <3215ed96> (a org.apache.hadoop.hdfs.BlockReaderLocal)
at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384)
at org.apache.hadoop.hdfs.DFSClient$BlockReader.readAll(DFSClient.java:1763)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:2333)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2400)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1363)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1799)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1643)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:338)
at
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543)
- locked <3da12c8a> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411)
- locked <3da12c8a> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3643)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3578)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3561)
- locked <74d81ea7> (a
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3599)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4407)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4380)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2039)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)

Reply via email to