I missed the fact that there is a patch to address the single HTable updating the index using a pool instead. See https://issues.apache.org/jira/browse/HBASE-2493. Unfortunately it did not get into 0.20.4. St.Ack
On Fri, Apr 30, 2010 at 4:44 PM, Jean-Daniel Cryans <jdcry...@apache.org> wrote: > On Fri, Apr 30, 2010 at 4:32 PM, Chris Tarnas <c...@email.com> wrote: >> Thank you, it is nice to get this help. >> >> I definitely understand the overhead of writing the index, although it seems >> much worse than just that overhead would indicate. If I understand you >> correctly that is because all inserts into an IndexedTable are synchronized >> on one table? If that was switched to using an HTablePool it would no longer >> be as sever a bottleneck (performance is about an order of magnitude better >> without the indexing)? > > They are synchronized per region server yes, and it _should_ be better > with a pool since then you can do parallel inserts. Patching it > doesn't seem hard, but maybe I'm missing some finer details since I > usually don't work around that code. > >> >> I'm also using thrift to connect and am wondering if that itself puts an >> overall limit on scaling? It does seem that no matter how many more mappers >> and servers I add, even without indexing, I am capped at about 5k rows/sec >> total. I'm waiting a bit as the table grows so that it is split across more >> regionservers, hopefully that will help, but as far as I can tell I am not >> hitting any CPU or IO constraint during my tests. > > I don't understand the "I'm also using thrift" and "how many more > mappers" part, you are using Thrift inside a map? Anyways, more > clients won't help since there's a single mega serialization of all > the inserts to the index table per region server. It's normal not to > see any CPU/mem/IO contention since, in this case, it's all about the > speed at which you can process a single row insertion The rest of the > threads just wait... > >> >> -chris >> >> I'm also using thrift, and while I am using the >> On Apr 30, 2010, at 3:00 PM, Jean-Daniel Cryans wrote: >> >>> The contrib packages doesn't get as much love as core HBase, so they >>> tend to be under performant and/or reliable and/or maintained and/or >>> etc. In this case the issue doesn't seem that bad since it could just >>> use a HTablePool, but using IndexedTables will definitely be slower >>> than straight insert since it writes to 2 tables (the main table and >>> the index). >>> >>> J-D >>> >>> On Fri, Apr 30, 2010 at 2:53 PM, Chris Tarnas <c...@email.com> wrote: >>>> It appears that for multiple simulations loads using the IndexTables >>>> probably not the best choice? >>>> >>>> -chris >>>> >>>> On Apr 30, 2010, at 2:39 PM, Jean-Daniel Cryans wrote: >>>> >>>>> Yeah more handlers won't do it here since there's tons of calls >>>>> waiting on a single synchronized method, I guess the IndexedRegion >>>>> should use a pool of HTables instead of a single one in order to >>>>> improve indexation throughput. >>>>> >>>>> J-D >>>>> >>>>> On Fri, Apr 30, 2010 at 2:26 PM, Chris Tarnas <c...@email.com> wrote: >>>>>> Here is the thread dump: >>>>>> >>>>>> I cranked up the handlers to 300 just in case and ran 40 mappers that >>>>>> loaded data via thrift. Each node runs its own thrift server. I saw an >>>>>> average of 18 rows/sec/mapper with no node using more than 10% CPU and >>>>>> no IO wait. It seems no matter how many mappers I throw the total number >>>>>> of rows/sec doesn't go much above 700 rows/second total, which seems >>>>>> very, very slow to me. >>>>>> >>>>>> Here is the thread dump from a node: >>>>>> >>>>>> http://pastebin.com/U3eLRdMV >>>>>> >>>>>> I do see quite a bit of waiting and some blocking in there, not sure how >>>>>> exactly to interpret it all though. >>>>>> >>>>>> thanks for any help! >>>>>> -chris >>>>>> >>>>>> On Apr 29, 2010, at 9:14 PM, Ryan Rawson wrote: >>>>>> >>>>>>> One thing to check is at the peak of your load, run jstack on one of >>>>>>> the regionservers, and look at the handler threads - if all of them >>>>>>> are doing something you might be running into handler contention. >>>>>>> >>>>>>> it is basically ultimately IO bound. >>>>>>> >>>>>>> -ryan >>>>>>> >>>>>>> On Thu, Apr 29, 2010 at 9:12 PM, Chris Tarnas <c...@email.com> wrote: >>>>>>>> They are all at 100, but none of the regionservers are loaded - most >>>>>>>> are >>>>>>>> less than 20% CPU. Is this all network latency? >>>>>>>> >>>>>>>> -chris >>>>>>>> >>>>>>>> On Apr 29, 2010, at 8:29 PM, Ryan Rawson <ryano...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Every insert on an indexed would require at the very least an RPC to a >>>>>>>>> different regionserver. If the regionservers are busy, your request >>>>>>>>> could wait in the queue for a moment. >>>>>>>>> >>>>>>>>> One param to tune would be the handler thread count. Set it to 100 at >>>>>>>>> least. >>>>>>>>> >>>>>>>>> On Thu, Apr 29, 2010 at 2:16 AM, Chris Tarnas <c...@email.com> wrote: >>>>>>>>>> >>>>>>>>>> I just finished some testing with JDK 1.6 u17 - so far no performance >>>>>>>>>> improvements with just changing that. Disabling LZO compression did >>>>>>>>>> gain a >>>>>>>>>> little bit (up to about 30/sec from 25/sec per thread). Turning of >>>>>>>>>> indexes >>>>>>>>>> helped the most - that brought me up to 115/sec @ 2875 total rows a >>>>>>>>>> second. >>>>>>>>>> A single perl/thrift process can load at over 350 rows/sec so its not >>>>>>>>>> scaling as well as I would have expected, even without the indexes. >>>>>>>>>> >>>>>>>>>> Are the transactional indexes that costly? What is the bottleneck >>>>>>>>>> there? >>>>>>>>>> CPU utilization and network packets went up when I disabled the >>>>>>>>>> indexes, I >>>>>>>>>> don't think those are the bottlenecks for the indexes. I was even >>>>>>>>>> able to >>>>>>>>>> add another 15 insert process (total of 40) and only lost about 10% >>>>>>>>>> on a per >>>>>>>>>> process throughput. I probably could go even higher, none of the >>>>>>>>>> nodes are >>>>>>>>>> above CPU 60% utilization and IO wait was at most 3.5%. >>>>>>>>>> >>>>>>>>>> Each rowkey is unique, so there should not be any blocking on the row >>>>>>>>>> locks. I'll do more indexed tests tomorrow. >>>>>>>>>> >>>>>>>>>> thanks, >>>>>>>>>> -chris >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Apr 29, 2010, at 12:18 AM, Todd Lipcon wrote: >>>>>>>>>> >>>>>>>>>>> Definitely smells like JDK 1.6.0_18. Downgrade that back to 16 or >>>>>>>>>>> 17 and >>>>>>>>>>> you >>>>>>>>>>> should be good to go. _18 is a botched release if I ever saw one. >>>>>>>>>>> >>>>>>>>>>> -Todd >>>>>>>>>>> >>>>>>>>>>> On Wed, Apr 28, 2010 at 10:54 PM, Chris Tarnas <c...@email.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Stack, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for looking. I checked the ganglia charts, no server was at >>>>>>>>>>>> more >>>>>>>>>>>> than ~20% CPU utilization at any time during the load test and >>>>>>>>>>>> swap was >>>>>>>>>>>> never used. Network traffic was light - just running a count >>>>>>>>>>>> through >>>>>>>>>>>> hbase >>>>>>>>>>>> shell generates a much higher use. One the server hosting meta >>>>>>>>>>>> specifically, >>>>>>>>>>>> it was at about 15-20% CPU, and IO wait never went above 3%, was >>>>>>>>>>>> usually >>>>>>>>>>>> down at near 0. >>>>>>>>>>>> >>>>>>>>>>>> The load also died with a thrift timeout on every single node (each >>>>>>>>>>>> node >>>>>>>>>>>> connecting to localhost for its thrift server), it looks like a >>>>>>>>>>>> datanode >>>>>>>>>>>> just died and caused every thrift connection to timeout - I'll >>>>>>>>>>>> have to >>>>>>>>>>>> up >>>>>>>>>>>> that limit to handle a node death. >>>>>>>>>>>> >>>>>>>>>>>> Checking logs this appears in the logs of the region server hosting >>>>>>>>>>>> meta, >>>>>>>>>>>> looks like the dead datanode causing this error: >>>>>>>>>>>> >>>>>>>>>>>> 2010-04-29 01:01:38,948 WARN org.apache.hadoop.hdfs.DFSClient: >>>>>>>>>>>> DFSOutputStream ResponseProcessor exception for block >>>>>>>>>>>> blk_508630839844593817_11180java.io.IOException: Bad response 1 for >>>>>>>>>>>> block >>>>>>>>>>>> blk_508630839844593817_11180 from datanode 10.195.150.255:50010 >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2423) >>>>>>>>>>>> >>>>>>>>>>>> The regionserver log on teh dead node, 10.195.150.255 has some more >>>>>>>>>>>> errors >>>>>>>>>>>> in it: >>>>>>>>>>>> >>>>>>>>>>>> http://pastebin.com/EFH9jz0w >>>>>>>>>>>> >>>>>>>>>>>> I found this in the .out file on the datanode: >>>>>>>>>>>> >>>>>>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode >>>>>>>>>>>> linux-amd64 ) >>>>>>>>>>>> # Problematic frame: >>>>>>>>>>>> # V [libjvm.so+0x62263c] >>>>>>>>>>>> # >>>>>>>>>>>> # An error report file with more information is saved as: >>>>>>>>>>>> # /usr/local/hadoop-0.20.1/hs_err_pid1364.log >>>>>>>>>>>> # >>>>>>>>>>>> # If you would like to submit a bug report, please visit: >>>>>>>>>>>> # http://java.sun.com/webapps/bugreport/crash.jsp >>>>>>>>>>>> # >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> There is not a single error in the datanode's log though. Also of >>>>>>>>>>>> note >>>>>>>>>>>> - >>>>>>>>>>>> this happened well into the test, so the node dying cause the load >>>>>>>>>>>> to >>>>>>>>>>>> abort >>>>>>>>>>>> but not the prior poor performance. Looking through the mailing >>>>>>>>>>>> list it >>>>>>>>>>>> looks like java 1.6.0_18 has a bad rep so I'll update the AMI >>>>>>>>>>>> (although >>>>>>>>>>>> I'm >>>>>>>>>>>> using the same JVM on other servers in the office w/o issue and >>>>>>>>>>>> decent >>>>>>>>>>>> single node performance and never dying...). >>>>>>>>>>>> >>>>>>>>>>>> Thanks for any help! >>>>>>>>>>>> -chris >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Apr 28, 2010, at 10:10 PM, Stack wrote: >>>>>>>>>>>> >>>>>>>>>>>>> What is load on the server hosting meta like? Higher than others? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Apr 28, 2010, at 8:42 PM, Chris Tarnas <c...@email.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi JG, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Speed is now down to 18 rows/sec/table per process. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here is a regionserver log that is serving two of the regions: >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://pastebin.com/Hx5se0hz >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here is the GC Log from the same server: >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://pastebin.com/ChrRvxCx >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here is the master log: >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://pastebin.com/L1Kn66qU >>>>>>>>>>>>>> >>>>>>>>>>>>>> The thrift server logs have nothing in them in the same time >>>>>>>>>>>>>> period. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks in advance! >>>>>>>>>>>>>> >>>>>>>>>>>>>> -chris >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Apr 28, 2010, at 7:32 PM, Jonathan Gray wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hey Chris, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> That's a really significant slowdown. I can't think of anything >>>>>>>>>>>> >>>>>>>>>>>> obvious that would cause that in your setup. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Any chance of some regionserver and master logs from the time >>>>>>>>>>>>>>> it was >>>>>>>>>>>> >>>>>>>>>>>> going slow? Is there any activity in the logs of the regionservers >>>>>>>>>>>> hosting >>>>>>>>>>>> the regions of the table being written to? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> JG >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>>>>> From: Christopher Tarnas [mailto:c...@tarnas.org] On Behalf Of >>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>> Tarnas >>>>>>>>>>>>>>>> Sent: Wednesday, April 28, 2010 6:27 PM >>>>>>>>>>>>>>>> To: hbase-user@hadoop.apache.org >>>>>>>>>>>>>>>> Subject: EC2 + Thrift inserts >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hello all, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> First, thanks to all the HBase developers for producing this, >>>>>>>>>>>>>>>> it's >>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>> great project and I'm glad to be able to use it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm looking for some help and hints here with insert >>>>>>>>>>>>>>>> performance >>>>>>>>>>>>>>>> help. >>>>>>>>>>>>>>>> I'm doing some benchmarking, testing how I can scale up using >>>>>>>>>>>>>>>> HBase, >>>>>>>>>>>>>>>> not really looking at raw speed. The testing is happening on >>>>>>>>>>>>>>>> EC2, >>>>>>>>>>>> >>>>>>>>>>>> using >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Andrew's scripts (thanks - those were very helpful) to set >>>>>>>>>>>>>>>> them up >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> with a slightly customized version of the default AMIs (added >>>>>>>>>>>>>>>> my >>>>>>>>>>>>>>>> application modules). I'm using HBase 20.3 and Hadoop 20.1. >>>>>>>>>>>>>>>> I've >>>>>>>>>>>> >>>>>>>>>>>> looked >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> at the tips in the Wiki and it looks like Andrew's scripts are >>>>>>>>>>>>>>>> already >>>>>>>>>>>>>>>> setup that way. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm inserting into HBase from a hadoop streaming job that runs >>>>>>>>>>>>>>>> perl >>>>>>>>>>>> >>>>>>>>>>>> and >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> uses the thrift gateway. I'm also using the Transactional >>>>>>>>>>>>>>>> tables so >>>>>>>>>>>>>>>> that alone could be the case, but from what I can tell I don't >>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>> so. LZO compression is also enabled for the column families >>>>>>>>>>>>>>>> (much >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>> the data is highly compressible). My cluster has 7 nodes, 5 >>>>>>>>>>>>>>>> regionservers, 1 master and 1 zookeeper. The regionservers and >>>>>>>>>>>>>>>> master >>>>>>>>>>>>>>>> are c1.xlarges. Each regionserver has the tasktrackers that >>>>>>>>>>>>>>>> runs >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> hadoop streaming jobs, and regionserver also runs its own >>>>>>>>>>>>>>>> thrift >>>>>>>>>>>>>>>> server. Each mapper that does the load talks to the localhost's >>>>>>>>>>>>>>>> thrift >>>>>>>>>>>>>>>> server. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The Row keys a fixed string + an incremental number then the >>>>>>>>>>>>>>>> order >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>> the bytes are reversed, so runA123 becomes 321Anur. I though of >>>>>>>>>>>>>>>> using >>>>>>>>>>>>>>>> murmur hash but was worried about collisions. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> As I add more insert jobs, each jobs throughput goes down. Way >>>>>>>>>>>>>>>> down. I >>>>>>>>>>>>>>>> went from about 200 row/sec/table per job with one job to >>>>>>>>>>>>>>>> about 24 >>>>>>>>>>>>>>>> rows/sec/table per job with 25 running jobs. The servers are >>>>>>>>>>>>>>>> mostly >>>>>>>>>>>>>>>> idle. I'm loading into two tables, one has several indexes and >>>>>>>>>>>>>>>> I'm >>>>>>>>>>>>>>>> loading into three column families, the other has no indexes >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> one >>>>>>>>>>>>>>>> column family. Both tables only currently have two region each. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The regionserver that serves the indexed table's regions is >>>>>>>>>>>>>>>> using >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> most CPU but is 87% idle. The other servers are all at ~90% >>>>>>>>>>>>>>>> idle. >>>>>>>>>>>> >>>>>>>>>>>> There >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> is no IO wait. the perl processes are barely ticking over. >>>>>>>>>>>>>>>> Java on >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> most "loaded" server is using about 50-60% of one CPU. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Normally when I do load in a pseudo-distrbuted hbase (my >>>>>>>>>>>>>>>> development >>>>>>>>>>>>>>>> platform) perl's speed is the limiting factor and uses about >>>>>>>>>>>>>>>> 85% of >>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>> CPU. In this cluster they are using only 5-10% of a CPU as >>>>>>>>>>>>>>>> they are >>>>>>>>>>>> >>>>>>>>>>>> all >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> waiting on thrift (hbase). When I run only 1 process on the >>>>>>>>>>>>>>>> cluster, >>>>>>>>>>>>>>>> perl uses much more of a CPU, maybe 70%. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Any tips or help in getting the speed/scalability up would be >>>>>>>>>>>>>>>> great. >>>>>>>>>>>>>>>> Please let me know if you need any other info. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> As I send this - it looks like the main table has split again >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>> being served by three regionservers.. My performance is going >>>>>>>>>>>>>>>> up a >>>>>>>>>>>>>>>> bit >>>>>>>>>>>>>>>> (now 35 rows/sec/table per processes), but still seems like >>>>>>>>>>>>>>>> I'm not >>>>>>>>>>>>>>>> using the full potential of even the limited EC2 system, no IO >>>>>>>>>>>>>>>> wait >>>>>>>>>>>> >>>>>>>>>>>> and >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> lots of idle CPU. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> many thanks >>>>>>>>>>>>>>>> -chris >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Todd Lipcon >>>>>>>>>>> Software Engineer, Cloudera >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> >