Schubert,
I can't think of any reason your random reads would get slower after
inserting more data, besides GC issues.
Do you have GC logging and JVM metrics logging turned on? I would
inspect those to see if you have any long-running GC pauses, or just
lots and lots of GC going on.
If I recall, you are running on 4GB nodes, 2GB RS heap, and cohosted
DataNodes and TaskTrackers. We ran for a long time on a similar setup,
but once we moved to 0.20 (and to the CMS garbage collector), we really
needed to add more memory to the nodes and increase RS heap to 4 or 5GB.
The CMS GC is less efficient in memory, but if given sufficient
resources, is much better for overall performance/throughput.
Also, do you have Ganglia setup? Are you seeing swapping on your RS
nodes? Is there high IO-wait CPU usage?
JG
Schubert Zhang wrote:
Addition.
Only random-reads become very slow, scans and sequential-reads are ok.
On Tue, Aug 18, 2009 at 6:02 PM, Schubert Zhang <[email protected]> wrote:
stack and J-G, Thank you very much for your helpful comment.
But now, we find such a critical issue for random reads.
I use sequentical-writes to insert 5GB of data in our HBase table from
empty, and ~30 regions are generated. Then the random-reads takes about 30
minutes to complete. And then, I run the sequentical-writes again. Thus,
another version of each cell are inserted, thus ~60 regions are generated.
But, we I ran the random-reads again to this table, it always take long time
(more than 2 hours).
I check the heap usage and other metrics, does not find the reason.
Bellow is the status of one region server:
request=0.0, regions=13, stores=13, storefiles=14, storefileIndexSize=2,
memstoreSize=0, usedHeap=1126, maxHeap=1991, blockCacheSize=338001080,
blockCacheFree=79686056, blockCacheCount=5014, blockCacheHitRatio=55
Schubert
On Tue, Aug 18, 2009 at 5:02 AM, Schubert Zhang <[email protected]> wrote:
We have just done a Performance Evaluation on HBase-0.20.0.
Refers to:
http://docloud.blogspot.com/2009/08/hbase-0200-performance-evaluation.html