Re: hbase performance

Chen Bangzhong Sat, 03 Apr 2010 06:45:31 -0700

the disk was about 160G-250G SATA disk. I didn't see the too many open files
exception in the log. so, ulimit seems not a problem.


My question is : for small data size (1k), the performance was much better.
So I wonder if there is some parameter related to the value size to tune
with?



2010/4/3 Jonathan Gray <jg...@facebook.com>

> Chen,
>
> In general, you're going to get significantly different performance on
> clusters of the size you are testing with.  What is the disk setup?
>
> Also, 2GB of ram is simply not enough to do any real testing.  I recommend
> a minimum of 2GB of heap for each RegionServer alone, though I strongly
> encourage 4GB of heap to get good performance.  You'll need at least 2GB
> additional for the DataNode and OS.
>
> JG
>
> > -----Original Message-----
> > From: Juhani Connolly [mailto:juh...@ninja.co.jp]
> > Sent: Friday, April 02, 2010 3:17 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: hbase performance
> >
> > On 04/02/2010 06:09 PM, Chen Bangzhong wrote:
> > > my switch is Dell 2724.
> > >
> > >
> > I'm not a network admin, and I don't have the ability to know how
> > congested your network is from that(nor do I think it is possible since
> > there's going to be a lot of other factors).
> >
> > Try running the test on a single machine using the miniCluster flag,
> > this should eliminate network transfer as an issue. If despite the fact
> > you're running everything on a single machine you get a high throughput
> > on that your network is likely the issue. If on the other hand
> > throughput goes down significantly the problem lies elsewhere.
> > > --在 2010年4月2日 下午5:04，Chen Bangzhong <bangzh...@gmail.com>写道：
> > >
> > >
> > >>
> > >> 在 2010年4月2日 下午4:58，Juhani Connolly <juh...@ninja.co.jp>写道：
> > >>
> > >> You're results seem very low, but your system specs are also quite
> > >>
> > >>> moderate.
> > >>>
> > >>> On 04/02/2010 04:46 PM, Chen Bangzhong wrote:
> > >>>
> > >>>> Hi, All
> > >>>>
> > >>>> I am benchmarking hbase. My HDFS clusters includes 4 servers (Dell
> > 860,
> > >>>>
> > >>> with
> > >>>
> > >>>> 2 GB RAM). One NameNode, one JobTracker, 2 DataNodes.
> > >>>>
> > >>>> My HBase Cluster also comprise 4 servers too. One Master, 2 region
> > and
> > >>>>
> > >>> one
> > >>>
> > >>>> ZooKeeper. (Dell 860, with 2 GB RAM)
> > >>>>
> > >>>>
> > >>> While I'm far from being an authority on the matter, running
> > >>> datanodes+regionservers together should help performance
> > >>> Try making your 2 datanodes + 2 regionservers into 4 servers
> > running
> > >>> both data/region.
> > >>>
> > >>>
> > >> I will try to run datanode and region server on the same server.
> > >>
> > >>
> > >>
> > >>>> I runned the org.apache.hadoop.PerformanceEvaluation on the
> > ZooKeeper
> > >>>> server. the ROW_LENGTH was changed from 1000 to ROW_LENGTH =
> > 100*1024;
> > >>>> So each value will be 100k in size.
> > >>>>
> > >>>> hadoop version is 0.20.2, hbase version is 0.20.3. dfs.replication
> > set
> > >>>>
> > >>> to 1.
> > >>>
> > >>>>
> > >>> Setting replication to 1 isn't going to give results that are very
> > >>> indicative of a "real" application, making it questionable as a
> > >>> benchmark. If you intend to run on a single replica at release,
> > you'll
> > >>> be at high risk of data loss.
> > >>>
> > >>>
> > >> Since I have only 2 data nodes, I set replication to 1. In
> > production, it
> > >> will be set to 3.
> > >>
> > >>
> > >>
> > >>>> The following is the command line:
> > >>>>
> > >>>> bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred
> > >>>> --rows=10000 randomWrite 20.
> > >>>>
> > >>>> It tooks about one hour to complete the test(3468628 ms), about 60
> > >>>>
> > >>> writes
> > >>>
> > >>>> per second. It seems the performance is disappointing.
> > >>>>
> > >>>> Is there anything I can do to make hbase perform better under 100k
> > size
> > >>>>
> > >>> ？I
> > >>>
> > >>>> didn't try the method mentioned in the performance wiki yet,
> > because I
> > >>>> thought 60writes/sec is too low.
> > >>>>
> > >>>>
> > >>>>
> > >>> Do you mean *over* 100k size?
> > >>> 2GB ram is pretty low and you'd likely get significantly better
> > >>> performance with it, though on this scale it probably isn't a
> > >>> significant problem.
> > >>>
> > >>>
> > >> the data size is exactly 100k size.
> > >>
> > >>
> > >>
> > >>>> If the value size is 1k, hbase performs much better. 200000
> > >>>>
> > >>> sequencewrite
> > >>>
> > >>>> tooks about 16 seconds, about 12500 writes/per second.
> > >>>>
> > >>>>
> > >>>>
> > >>> Comparing sequencewrite performance with randomwrite isn't a
> > helpful
> > >>> indicator. Do you have randomWrite results for 1k values? The way
> > your
> > >>> performance degrades with the size of the records seems like you
> > may
> > >>> have a bottleneck at network transfer? What's rack locality like
> > and how
> > >>> much bandwidth do you have between the servers?
> > >>>
> > >>>> Now I am trying to benchmark using two clients on 2 servers, no
> > result
> > >>>>
> > >>> yet.
> > >>>
> > >>>>
> > >>>>
> > >>>
> > >> for 1k datasize, the sequencewrite performance and randomWrite
> > performance
> > >> is about the same. All my servers are under one switch, don't know
> > the
> > >> switch bandwidth yet.
> > >>
> > >>
> > >>
> > >>>  You're already running 20 clients on your first server with the
> > >>> PerformanceEvaluation. Do you mean you intend to run 20 on each?
> > >>>
> > >>>
> > >> In fact, it is 20 threads on one machine.
> > >>
> > >>
> > >>> Hopefully someone with better knowledge can give a better answer
> > but my
> > >>> guess is that you have a network transfer transfer. Try doing
> > further
> > >>> tests with randomWrite and decreasing value sizes and see if the
> > time
> > >>> correlates to the total amount of data written.
> > >>>
> > >>>
> > >>>
> > >>
> > >
>
>

Re: hbase performance

Reply via email to