I've done similar work couple of months ago. Start by sharing more details on your program, hbase setup, and the way you measure network and disk bottlenecks.
Also, have you isolated network and disk on all nodes and between all nodes? (Each two nodes)) Test them separately and give us those numbers. Next do a copyFromLocal to hdfs from master on a file which at least the size of your machine memory (to make sure you write to disk and not Linux memory). Tell us the copy throughput. Sent from my iPhone On 11 בינו 2013, at 06:31, Bryan Keller <[email protected]> wrote: > I am attempting to configure HBase to maximize throughput, and have noticed > some bottlenecks. In particular, with my configuration, write performance is > well below theoretical throughput. I have a test program that inserts many > rows into a test table. Network I/O is less than 20% of max, and disk I/O is > even lower, maybe around 5% max on all boxes in the cluster. CPU is well > below than 50% max on all boxes. I do not see any I/O waits or anything in > particular than raises concerns. I am using iostat and iftop to test > throughput. To determine theoretical max, I used dd and iperf. I have spent > quite a bit of time optimizing the HBase config parameters, optimizing GC, > etc., and am familiar with the HBase book online and such.
