Re: EC2 + Thrift inserts

Chris Tarnas Fri, 30 Apr 2010 14:27:03 -0700

Here is the thread dump:

I cranked up the handlers to 300 just in case and ran 40 mappers that loaded 
data via thrift. Each node runs its own thrift server. I saw an average of 18 
rows/sec/mapper with no node using more than 10% CPU and no IO wait. It seems 
no matter how many mappers I throw the total number of rows/sec doesn't go much 
above 700 rows/second total, which seems very, very slow to me.


Here is the thread dump from a node:

http://pastebin.com/U3eLRdMV

I do see quite a bit of waiting and some blocking in there, not sure how 
exactly to interpret it all though.

thanks for any help!
-chris

On Apr 29, 2010, at 9:14 PM, Ryan Rawson wrote:

> One thing to check is at the peak of your load, run jstack on one of
> the regionservers, and look at the handler threads - if all of them
> are doing something you might be running into handler contention.
> 
> it is basically ultimately IO bound.
> 
> -ryan
> 
> On Thu, Apr 29, 2010 at 9:12 PM, Chris Tarnas <[email protected]> wrote:
>> They are all at 100, but none of the regionservers are loaded - most are
>> less than 20% CPU. Is this all network latency?
>> 
>> -chris
>> 
>> On Apr 29, 2010, at 8:29 PM, Ryan Rawson <[email protected]> wrote:
>> 
>>> Every insert on an indexed would require at the very least an RPC to a
>>> different regionserver.  If the regionservers are busy, your request
>>> could wait in the queue for a moment.
>>> 
>>> One param to tune would be the handler thread count.  Set it to 100 at
>>> least.
>>> 
>>> On Thu, Apr 29, 2010 at 2:16 AM, Chris Tarnas <[email protected]> wrote:
>>>> 
>>>> I just finished some testing with JDK 1.6 u17 - so far no performance
>>>> improvements with just changing that. Disabling LZO compression did gain a
>>>> little bit (up to about 30/sec from 25/sec per thread). Turning of indexes
>>>> helped the most - that brought me up to 115/sec @ 2875 total rows a second.
>>>> A single perl/thrift process can load at over 350 rows/sec so its not
>>>> scaling as well as I would have expected, even without the indexes.
>>>> 
>>>> Are the transactional indexes that costly? What is the bottleneck there?
>>>> CPU utilization and network packets went up when I disabled the indexes, I
>>>> don't think those are the bottlenecks for the indexes. I was even able to
>>>> add another 15 insert process (total of 40) and only lost about 10% on a 
>>>> per
>>>> process throughput. I probably could go even higher, none of the nodes are
>>>> above CPU 60% utilization and IO wait was at most 3.5%.
>>>> 
>>>> Each rowkey is unique, so there should not be any blocking on the row
>>>> locks. I'll do more indexed tests tomorrow.
>>>> 
>>>> thanks,
>>>> -chris
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Apr 29, 2010, at 12:18 AM, Todd Lipcon wrote:
>>>> 
>>>>> Definitely smells like JDK 1.6.0_18. Downgrade that back to 16 or 17 and
>>>>> you
>>>>> should be good to go. _18 is a botched release if I ever saw one.
>>>>> 
>>>>> -Todd
>>>>> 
>>>>> On Wed, Apr 28, 2010 at 10:54 PM, Chris Tarnas <[email protected]> wrote:
>>>>> 
>>>>>> Hi Stack,
>>>>>> 
>>>>>> Thanks for looking. I checked the ganglia charts, no server was at more
>>>>>> than ~20% CPU utilization at any time during the load test and swap was
>>>>>> never used. Network traffic was light - just running a count through
>>>>>> hbase
>>>>>> shell generates a much higher use. One the server hosting meta
>>>>>> specifically,
>>>>>> it was at about 15-20% CPU, and IO wait never went above 3%, was
>>>>>> usually
>>>>>> down at near 0.
>>>>>> 
>>>>>> The load also died with a thrift timeout on every single node (each
>>>>>> node
>>>>>> connecting to localhost for its thrift server), it looks like a
>>>>>> datanode
>>>>>> just died and caused every thrift connection to timeout - I'll have to
>>>>>> up
>>>>>> that limit to handle a node death.
>>>>>> 
>>>>>> Checking logs this appears in the logs of the region server hosting
>>>>>> meta,
>>>>>> looks like the dead datanode causing this error:
>>>>>> 
>>>>>> 2010-04-29 01:01:38,948 WARN org.apache.hadoop.hdfs.DFSClient:
>>>>>> DFSOutputStream ResponseProcessor exception  for block
>>>>>> blk_508630839844593817_11180java.io.IOException: Bad response 1 for
>>>>>> block
>>>>>> blk_508630839844593817_11180 from datanode 10.195.150.255:50010
>>>>>>      at
>>>>>> 
>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2423)
>>>>>> 
>>>>>> The regionserver log on teh dead node, 10.195.150.255 has some more
>>>>>> errors
>>>>>> in it:
>>>>>> 
>>>>>> http://pastebin.com/EFH9jz0w
>>>>>> 
>>>>>> I found this in the .out file on the datanode:
>>>>>> 
>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode
>>>>>> linux-amd64 )
>>>>>> # Problematic frame:
>>>>>> # V  [libjvm.so+0x62263c]
>>>>>> #
>>>>>> # An error report file with more information is saved as:
>>>>>> # /usr/local/hadoop-0.20.1/hs_err_pid1364.log
>>>>>> #
>>>>>> # If you would like to submit a bug report, please visit:
>>>>>> #   http://java.sun.com/webapps/bugreport/crash.jsp
>>>>>> #
>>>>>> 
>>>>>> 
>>>>>> There is not a single error in the datanode's log though. Also of note
>>>>>> -
>>>>>> this happened well into the test, so the node dying cause the load to
>>>>>> abort
>>>>>> but not the prior poor performance. Looking through the mailing list it
>>>>>> looks like java 1.6.0_18 has a bad rep so I'll update the AMI (although
>>>>>> I'm
>>>>>> using the same JVM on other servers in the office w/o issue and decent
>>>>>> single node performance and never dying...).
>>>>>> 
>>>>>> Thanks for any help!
>>>>>> -chris
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Apr 28, 2010, at 10:10 PM, Stack wrote:
>>>>>> 
>>>>>>> What is load on the server hosting meta like?  Higher than others?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Apr 28, 2010, at 8:42 PM, Chris Tarnas <[email protected]> wrote:
>>>>>>> 
>>>>>>>> Hi JG,
>>>>>>>> 
>>>>>>>> Speed is now down to 18 rows/sec/table per process.
>>>>>>>> 
>>>>>>>> Here is a regionserver log that is serving two of the regions:
>>>>>>>> 
>>>>>>>> http://pastebin.com/Hx5se0hz
>>>>>>>> 
>>>>>>>> Here is the GC Log from the same server:
>>>>>>>> 
>>>>>>>> http://pastebin.com/ChrRvxCx
>>>>>>>> 
>>>>>>>> Here is the master log:
>>>>>>>> 
>>>>>>>> http://pastebin.com/L1Kn66qU
>>>>>>>> 
>>>>>>>> The thrift server logs have nothing in them in the same time period.
>>>>>>>> 
>>>>>>>> Thanks in advance!
>>>>>>>> 
>>>>>>>> -chris
>>>>>>>> 
>>>>>>>> On Apr 28, 2010, at 7:32 PM, Jonathan Gray wrote:
>>>>>>>> 
>>>>>>>>> Hey Chris,
>>>>>>>>> 
>>>>>>>>> That's a really significant slowdown.  I can't think of anything
>>>>>> 
>>>>>> obvious that would cause that in your setup.
>>>>>>>>> 
>>>>>>>>> Any chance of some regionserver and master logs from the time it was
>>>>>> 
>>>>>> going slow?  Is there any activity in the logs of the regionservers
>>>>>> hosting
>>>>>> the regions of the table being written to?
>>>>>>>>> 
>>>>>>>>> JG
>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Christopher Tarnas [mailto:[email protected]] On Behalf Of Chris
>>>>>>>>>> Tarnas
>>>>>>>>>> Sent: Wednesday, April 28, 2010 6:27 PM
>>>>>>>>>> To: [email protected]
>>>>>>>>>> Subject: EC2 + Thrift inserts
>>>>>>>>>> 
>>>>>>>>>> Hello all,
>>>>>>>>>> 
>>>>>>>>>> First, thanks to all the HBase developers for producing this, it's
>>>>>>>>>> a
>>>>>>>>>> great project and I'm glad to be able to use it.
>>>>>>>>>> 
>>>>>>>>>> I'm looking for some help and hints here with insert performance
>>>>>>>>>> help.
>>>>>>>>>> I'm doing some benchmarking, testing how I can scale up using
>>>>>>>>>> HBase,
>>>>>>>>>> not really looking at raw speed. The testing is happening on EC2,
>>>>>> 
>>>>>> using
>>>>>>>>>> 
>>>>>>>>>> Andrew's scripts (thanks - those were very helpful) to set them up
>>>>>>>>>> and
>>>>>>>>>> with a slightly customized version of the default AMIs (added my
>>>>>>>>>> application modules). I'm using HBase 20.3 and Hadoop 20.1. I've
>>>>>> 
>>>>>> looked
>>>>>>>>>> 
>>>>>>>>>> at the tips in the Wiki and it looks like Andrew's scripts are
>>>>>>>>>> already
>>>>>>>>>> setup that way.
>>>>>>>>>> 
>>>>>>>>>> I'm inserting into HBase from a hadoop streaming job that runs perl
>>>>>> 
>>>>>> and
>>>>>>>>>> 
>>>>>>>>>> uses the thrift gateway. I'm also using the Transactional tables so
>>>>>>>>>> that alone could be the case, but from what I can tell I don't
>>>>>>>>>> think
>>>>>>>>>> so. LZO compression is also enabled for the column families (much
>>>>>>>>>> of
>>>>>>>>>> the data is highly compressible). My cluster has 7 nodes, 5
>>>>>>>>>> regionservers, 1 master and 1 zookeeper. The regionservers and
>>>>>>>>>> master
>>>>>>>>>> are c1.xlarges. Each regionserver has the tasktrackers that runs
>>>>>>>>>> the
>>>>>>>>>> hadoop streaming jobs, and regionserver also runs its own thrift
>>>>>>>>>> server. Each mapper that does the load talks to the localhost's
>>>>>>>>>> thrift
>>>>>>>>>> server.
>>>>>>>>>> 
>>>>>>>>>> The Row keys a fixed string + an incremental number then the order
>>>>>>>>>> of
>>>>>>>>>> the bytes are reversed, so runA123 becomes 321Anur. I though of
>>>>>>>>>> using
>>>>>>>>>> murmur hash but was worried about collisions.
>>>>>>>>>> 
>>>>>>>>>> As I add more insert jobs, each jobs throughput goes down. Way
>>>>>>>>>> down. I
>>>>>>>>>> went from about 200 row/sec/table per job with one job to about 24
>>>>>>>>>> rows/sec/table per job with 25 running jobs. The servers are mostly
>>>>>>>>>> idle. I'm loading into two tables, one has several indexes and I'm
>>>>>>>>>> loading into three column families, the other has no indexes and
>>>>>>>>>> one
>>>>>>>>>> column family. Both tables only currently have two region each.
>>>>>>>>>> 
>>>>>>>>>> The regionserver that serves the indexed table's regions is using
>>>>>>>>>> the
>>>>>>>>>> most CPU but is 87% idle. The other servers are all at ~90% idle.
>>>>>> 
>>>>>> There
>>>>>>>>>> 
>>>>>>>>>> is no IO wait. the perl processes are barely ticking over. Java on
>>>>>>>>>> the
>>>>>>>>>> most "loaded" server is using about 50-60% of one CPU.
>>>>>>>>>> 
>>>>>>>>>> Normally when I do load in a pseudo-distrbuted hbase (my
>>>>>>>>>> development
>>>>>>>>>> platform) perl's speed is the limiting factor and uses about 85% of
>>>>>>>>>> a
>>>>>>>>>> CPU. In this cluster they are using only 5-10% of a CPU as they are
>>>>>> 
>>>>>> all
>>>>>>>>>> 
>>>>>>>>>> waiting on thrift (hbase). When I run only 1 process on the
>>>>>>>>>> cluster,
>>>>>>>>>> perl uses much more of a CPU, maybe 70%.
>>>>>>>>>> 
>>>>>>>>>> Any tips or help in getting the speed/scalability up would be
>>>>>>>>>> great.
>>>>>>>>>> Please let me know if you need any other info.
>>>>>>>>>> 
>>>>>>>>>> As I send this - it looks like the main table has split again and
>>>>>>>>>> is
>>>>>>>>>> being served by three regionservers.. My performance is going up a
>>>>>>>>>> bit
>>>>>>>>>> (now 35 rows/sec/table per processes), but still seems like I'm not
>>>>>>>>>> using the full potential of even the limited EC2 system, no IO wait
>>>>>> 
>>>>>> and
>>>>>>>>>> 
>>>>>>>>>> lots of idle CPU.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> many thanks
>>>>>>>>>> -chris
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>> 
>>>> 
>>

Re: EC2 + Thrift inserts

Reply via email to