the disk was about 160G-250G SATA disk. I didn't see the too many open files exception in the log. so, ulimit seems not a problem.
My question is : for small data size (1k), the performance was much better. So I wonder if there is some parameter related to the value size to tune with? 2010/4/3 Jonathan Gray <jg...@facebook.com> > Chen, > > In general, you're going to get significantly different performance on > clusters of the size you are testing with. What is the disk setup? > > Also, 2GB of ram is simply not enough to do any real testing. I recommend > a minimum of 2GB of heap for each RegionServer alone, though I strongly > encourage 4GB of heap to get good performance. You'll need at least 2GB > additional for the DataNode and OS. > > JG > > > -----Original Message----- > > From: Juhani Connolly [mailto:juh...@ninja.co.jp] > > Sent: Friday, April 02, 2010 3:17 AM > > To: hbase-user@hadoop.apache.org > > Subject: Re: hbase performance > > > > On 04/02/2010 06:09 PM, Chen Bangzhong wrote: > > > my switch is Dell 2724. > > > > > > > > I'm not a network admin, and I don't have the ability to know how > > congested your network is from that(nor do I think it is possible since > > there's going to be a lot of other factors). > > > > Try running the test on a single machine using the miniCluster flag, > > this should eliminate network transfer as an issue. If despite the fact > > you're running everything on a single machine you get a high throughput > > on that your network is likely the issue. If on the other hand > > throughput goes down significantly the problem lies elsewhere. > > > --在 2010年4月2日 下午5:04,Chen Bangzhong <bangzh...@gmail.com>写道: > > > > > > > > >> > > >> 在 2010年4月2日 下午4:58,Juhani Connolly <juh...@ninja.co.jp>写道: > > >> > > >> You're results seem very low, but your system specs are also quite > > >> > > >>> moderate. > > >>> > > >>> On 04/02/2010 04:46 PM, Chen Bangzhong wrote: > > >>> > > >>>> Hi, All > > >>>> > > >>>> I am benchmarking hbase. My HDFS clusters includes 4 servers (Dell > > 860, > > >>>> > > >>> with > > >>> > > >>>> 2 GB RAM). One NameNode, one JobTracker, 2 DataNodes. > > >>>> > > >>>> My HBase Cluster also comprise 4 servers too. One Master, 2 region > > and > > >>>> > > >>> one > > >>> > > >>>> ZooKeeper. (Dell 860, with 2 GB RAM) > > >>>> > > >>>> > > >>> While I'm far from being an authority on the matter, running > > >>> datanodes+regionservers together should help performance > > >>> Try making your 2 datanodes + 2 regionservers into 4 servers > > running > > >>> both data/region. > > >>> > > >>> > > >> I will try to run datanode and region server on the same server. > > >> > > >> > > >> > > >>>> I runned the org.apache.hadoop.PerformanceEvaluation on the > > ZooKeeper > > >>>> server. the ROW_LENGTH was changed from 1000 to ROW_LENGTH = > > 100*1024; > > >>>> So each value will be 100k in size. > > >>>> > > >>>> hadoop version is 0.20.2, hbase version is 0.20.3. dfs.replication > > set > > >>>> > > >>> to 1. > > >>> > > >>>> > > >>> Setting replication to 1 isn't going to give results that are very > > >>> indicative of a "real" application, making it questionable as a > > >>> benchmark. If you intend to run on a single replica at release, > > you'll > > >>> be at high risk of data loss. > > >>> > > >>> > > >> Since I have only 2 data nodes, I set replication to 1. In > > production, it > > >> will be set to 3. > > >> > > >> > > >> > > >>>> The following is the command line: > > >>>> > > >>>> bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred > > >>>> --rows=10000 randomWrite 20. > > >>>> > > >>>> It tooks about one hour to complete the test(3468628 ms), about 60 > > >>>> > > >>> writes > > >>> > > >>>> per second. It seems the performance is disappointing. > > >>>> > > >>>> Is there anything I can do to make hbase perform better under 100k > > size > > >>>> > > >>> ?I > > >>> > > >>>> didn't try the method mentioned in the performance wiki yet, > > because I > > >>>> thought 60writes/sec is too low. > > >>>> > > >>>> > > >>>> > > >>> Do you mean *over* 100k size? > > >>> 2GB ram is pretty low and you'd likely get significantly better > > >>> performance with it, though on this scale it probably isn't a > > >>> significant problem. > > >>> > > >>> > > >> the data size is exactly 100k size. > > >> > > >> > > >> > > >>>> If the value size is 1k, hbase performs much better. 200000 > > >>>> > > >>> sequencewrite > > >>> > > >>>> tooks about 16 seconds, about 12500 writes/per second. > > >>>> > > >>>> > > >>>> > > >>> Comparing sequencewrite performance with randomwrite isn't a > > helpful > > >>> indicator. Do you have randomWrite results for 1k values? The way > > your > > >>> performance degrades with the size of the records seems like you > > may > > >>> have a bottleneck at network transfer? What's rack locality like > > and how > > >>> much bandwidth do you have between the servers? > > >>> > > >>>> Now I am trying to benchmark using two clients on 2 servers, no > > result > > >>>> > > >>> yet. > > >>> > > >>>> > > >>>> > > >>> > > >> for 1k datasize, the sequencewrite performance and randomWrite > > performance > > >> is about the same. All my servers are under one switch, don't know > > the > > >> switch bandwidth yet. > > >> > > >> > > >> > > >>> You're already running 20 clients on your first server with the > > >>> PerformanceEvaluation. Do you mean you intend to run 20 on each? > > >>> > > >>> > > >> In fact, it is 20 threads on one machine. > > >> > > >> > > >>> Hopefully someone with better knowledge can give a better answer > > but my > > >>> guess is that you have a network transfer transfer. Try doing > > further > > >>> tests with randomWrite and decreasing value sizes and see if the > > time > > >>> correlates to the total amount of data written. > > >>> > > >>> > > >>> > > >> > > > > >