Here is the thread dump: I cranked up the handlers to 300 just in case and ran 40 mappers that loaded data via thrift. Each node runs its own thrift server. I saw an average of 18 rows/sec/mapper with no node using more than 10% CPU and no IO wait. It seems no matter how many mappers I throw the total number of rows/sec doesn't go much above 700 rows/second total, which seems very, very slow to me.
Here is the thread dump from a node: http://pastebin.com/U3eLRdMV I do see quite a bit of waiting and some blocking in there, not sure how exactly to interpret it all though. thanks for any help! -chris On Apr 29, 2010, at 9:14 PM, Ryan Rawson wrote: > One thing to check is at the peak of your load, run jstack on one of > the regionservers, and look at the handler threads - if all of them > are doing something you might be running into handler contention. > > it is basically ultimately IO bound. > > -ryan > > On Thu, Apr 29, 2010 at 9:12 PM, Chris Tarnas <c...@email.com> wrote: >> They are all at 100, but none of the regionservers are loaded - most are >> less than 20% CPU. Is this all network latency? >> >> -chris >> >> On Apr 29, 2010, at 8:29 PM, Ryan Rawson <ryano...@gmail.com> wrote: >> >>> Every insert on an indexed would require at the very least an RPC to a >>> different regionserver. If the regionservers are busy, your request >>> could wait in the queue for a moment. >>> >>> One param to tune would be the handler thread count. Set it to 100 at >>> least. >>> >>> On Thu, Apr 29, 2010 at 2:16 AM, Chris Tarnas <c...@email.com> wrote: >>>> >>>> I just finished some testing with JDK 1.6 u17 - so far no performance >>>> improvements with just changing that. Disabling LZO compression did gain a >>>> little bit (up to about 30/sec from 25/sec per thread). Turning of indexes >>>> helped the most - that brought me up to 115/sec @ 2875 total rows a second. >>>> A single perl/thrift process can load at over 350 rows/sec so its not >>>> scaling as well as I would have expected, even without the indexes. >>>> >>>> Are the transactional indexes that costly? What is the bottleneck there? >>>> CPU utilization and network packets went up when I disabled the indexes, I >>>> don't think those are the bottlenecks for the indexes. I was even able to >>>> add another 15 insert process (total of 40) and only lost about 10% on a >>>> per >>>> process throughput. I probably could go even higher, none of the nodes are >>>> above CPU 60% utilization and IO wait was at most 3.5%. >>>> >>>> Each rowkey is unique, so there should not be any blocking on the row >>>> locks. I'll do more indexed tests tomorrow. >>>> >>>> thanks, >>>> -chris >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Apr 29, 2010, at 12:18 AM, Todd Lipcon wrote: >>>> >>>>> Definitely smells like JDK 1.6.0_18. Downgrade that back to 16 or 17 and >>>>> you >>>>> should be good to go. _18 is a botched release if I ever saw one. >>>>> >>>>> -Todd >>>>> >>>>> On Wed, Apr 28, 2010 at 10:54 PM, Chris Tarnas <c...@email.com> wrote: >>>>> >>>>>> Hi Stack, >>>>>> >>>>>> Thanks for looking. I checked the ganglia charts, no server was at more >>>>>> than ~20% CPU utilization at any time during the load test and swap was >>>>>> never used. Network traffic was light - just running a count through >>>>>> hbase >>>>>> shell generates a much higher use. One the server hosting meta >>>>>> specifically, >>>>>> it was at about 15-20% CPU, and IO wait never went above 3%, was >>>>>> usually >>>>>> down at near 0. >>>>>> >>>>>> The load also died with a thrift timeout on every single node (each >>>>>> node >>>>>> connecting to localhost for its thrift server), it looks like a >>>>>> datanode >>>>>> just died and caused every thrift connection to timeout - I'll have to >>>>>> up >>>>>> that limit to handle a node death. >>>>>> >>>>>> Checking logs this appears in the logs of the region server hosting >>>>>> meta, >>>>>> looks like the dead datanode causing this error: >>>>>> >>>>>> 2010-04-29 01:01:38,948 WARN org.apache.hadoop.hdfs.DFSClient: >>>>>> DFSOutputStream ResponseProcessor exception for block >>>>>> blk_508630839844593817_11180java.io.IOException: Bad response 1 for >>>>>> block >>>>>> blk_508630839844593817_11180 from datanode 10.195.150.255:50010 >>>>>> at >>>>>> >>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2423) >>>>>> >>>>>> The regionserver log on teh dead node, 10.195.150.255 has some more >>>>>> errors >>>>>> in it: >>>>>> >>>>>> http://pastebin.com/EFH9jz0w >>>>>> >>>>>> I found this in the .out file on the datanode: >>>>>> >>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode >>>>>> linux-amd64 ) >>>>>> # Problematic frame: >>>>>> # V [libjvm.so+0x62263c] >>>>>> # >>>>>> # An error report file with more information is saved as: >>>>>> # /usr/local/hadoop-0.20.1/hs_err_pid1364.log >>>>>> # >>>>>> # If you would like to submit a bug report, please visit: >>>>>> # http://java.sun.com/webapps/bugreport/crash.jsp >>>>>> # >>>>>> >>>>>> >>>>>> There is not a single error in the datanode's log though. Also of note >>>>>> - >>>>>> this happened well into the test, so the node dying cause the load to >>>>>> abort >>>>>> but not the prior poor performance. Looking through the mailing list it >>>>>> looks like java 1.6.0_18 has a bad rep so I'll update the AMI (although >>>>>> I'm >>>>>> using the same JVM on other servers in the office w/o issue and decent >>>>>> single node performance and never dying...). >>>>>> >>>>>> Thanks for any help! >>>>>> -chris >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Apr 28, 2010, at 10:10 PM, Stack wrote: >>>>>> >>>>>>> What is load on the server hosting meta like? Higher than others? >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Apr 28, 2010, at 8:42 PM, Chris Tarnas <c...@email.com> wrote: >>>>>>> >>>>>>>> Hi JG, >>>>>>>> >>>>>>>> Speed is now down to 18 rows/sec/table per process. >>>>>>>> >>>>>>>> Here is a regionserver log that is serving two of the regions: >>>>>>>> >>>>>>>> http://pastebin.com/Hx5se0hz >>>>>>>> >>>>>>>> Here is the GC Log from the same server: >>>>>>>> >>>>>>>> http://pastebin.com/ChrRvxCx >>>>>>>> >>>>>>>> Here is the master log: >>>>>>>> >>>>>>>> http://pastebin.com/L1Kn66qU >>>>>>>> >>>>>>>> The thrift server logs have nothing in them in the same time period. >>>>>>>> >>>>>>>> Thanks in advance! >>>>>>>> >>>>>>>> -chris >>>>>>>> >>>>>>>> On Apr 28, 2010, at 7:32 PM, Jonathan Gray wrote: >>>>>>>> >>>>>>>>> Hey Chris, >>>>>>>>> >>>>>>>>> That's a really significant slowdown. I can't think of anything >>>>>> >>>>>> obvious that would cause that in your setup. >>>>>>>>> >>>>>>>>> Any chance of some regionserver and master logs from the time it was >>>>>> >>>>>> going slow? Is there any activity in the logs of the regionservers >>>>>> hosting >>>>>> the regions of the table being written to? >>>>>>>>> >>>>>>>>> JG >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Christopher Tarnas [mailto:c...@tarnas.org] On Behalf Of Chris >>>>>>>>>> Tarnas >>>>>>>>>> Sent: Wednesday, April 28, 2010 6:27 PM >>>>>>>>>> To: hbase-user@hadoop.apache.org >>>>>>>>>> Subject: EC2 + Thrift inserts >>>>>>>>>> >>>>>>>>>> Hello all, >>>>>>>>>> >>>>>>>>>> First, thanks to all the HBase developers for producing this, it's >>>>>>>>>> a >>>>>>>>>> great project and I'm glad to be able to use it. >>>>>>>>>> >>>>>>>>>> I'm looking for some help and hints here with insert performance >>>>>>>>>> help. >>>>>>>>>> I'm doing some benchmarking, testing how I can scale up using >>>>>>>>>> HBase, >>>>>>>>>> not really looking at raw speed. The testing is happening on EC2, >>>>>> >>>>>> using >>>>>>>>>> >>>>>>>>>> Andrew's scripts (thanks - those were very helpful) to set them up >>>>>>>>>> and >>>>>>>>>> with a slightly customized version of the default AMIs (added my >>>>>>>>>> application modules). I'm using HBase 20.3 and Hadoop 20.1. I've >>>>>> >>>>>> looked >>>>>>>>>> >>>>>>>>>> at the tips in the Wiki and it looks like Andrew's scripts are >>>>>>>>>> already >>>>>>>>>> setup that way. >>>>>>>>>> >>>>>>>>>> I'm inserting into HBase from a hadoop streaming job that runs perl >>>>>> >>>>>> and >>>>>>>>>> >>>>>>>>>> uses the thrift gateway. I'm also using the Transactional tables so >>>>>>>>>> that alone could be the case, but from what I can tell I don't >>>>>>>>>> think >>>>>>>>>> so. LZO compression is also enabled for the column families (much >>>>>>>>>> of >>>>>>>>>> the data is highly compressible). My cluster has 7 nodes, 5 >>>>>>>>>> regionservers, 1 master and 1 zookeeper. The regionservers and >>>>>>>>>> master >>>>>>>>>> are c1.xlarges. Each regionserver has the tasktrackers that runs >>>>>>>>>> the >>>>>>>>>> hadoop streaming jobs, and regionserver also runs its own thrift >>>>>>>>>> server. Each mapper that does the load talks to the localhost's >>>>>>>>>> thrift >>>>>>>>>> server. >>>>>>>>>> >>>>>>>>>> The Row keys a fixed string + an incremental number then the order >>>>>>>>>> of >>>>>>>>>> the bytes are reversed, so runA123 becomes 321Anur. I though of >>>>>>>>>> using >>>>>>>>>> murmur hash but was worried about collisions. >>>>>>>>>> >>>>>>>>>> As I add more insert jobs, each jobs throughput goes down. Way >>>>>>>>>> down. I >>>>>>>>>> went from about 200 row/sec/table per job with one job to about 24 >>>>>>>>>> rows/sec/table per job with 25 running jobs. The servers are mostly >>>>>>>>>> idle. I'm loading into two tables, one has several indexes and I'm >>>>>>>>>> loading into three column families, the other has no indexes and >>>>>>>>>> one >>>>>>>>>> column family. Both tables only currently have two region each. >>>>>>>>>> >>>>>>>>>> The regionserver that serves the indexed table's regions is using >>>>>>>>>> the >>>>>>>>>> most CPU but is 87% idle. The other servers are all at ~90% idle. >>>>>> >>>>>> There >>>>>>>>>> >>>>>>>>>> is no IO wait. the perl processes are barely ticking over. Java on >>>>>>>>>> the >>>>>>>>>> most "loaded" server is using about 50-60% of one CPU. >>>>>>>>>> >>>>>>>>>> Normally when I do load in a pseudo-distrbuted hbase (my >>>>>>>>>> development >>>>>>>>>> platform) perl's speed is the limiting factor and uses about 85% of >>>>>>>>>> a >>>>>>>>>> CPU. In this cluster they are using only 5-10% of a CPU as they are >>>>>> >>>>>> all >>>>>>>>>> >>>>>>>>>> waiting on thrift (hbase). When I run only 1 process on the >>>>>>>>>> cluster, >>>>>>>>>> perl uses much more of a CPU, maybe 70%. >>>>>>>>>> >>>>>>>>>> Any tips or help in getting the speed/scalability up would be >>>>>>>>>> great. >>>>>>>>>> Please let me know if you need any other info. >>>>>>>>>> >>>>>>>>>> As I send this - it looks like the main table has split again and >>>>>>>>>> is >>>>>>>>>> being served by three regionservers.. My performance is going up a >>>>>>>>>> bit >>>>>>>>>> (now 35 rows/sec/table per processes), but still seems like I'm not >>>>>>>>>> using the full potential of even the limited EC2 system, no IO wait >>>>>> >>>>>> and >>>>>>>>>> >>>>>>>>>> lots of idle CPU. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> many thanks >>>>>>>>>> -chris >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Todd Lipcon >>>>> Software Engineer, Cloudera >>>> >>>> >>