Hi Rita, I get a bit grumpy when I see IOPS as the primary metric with respect to HDFS.
Why? While IOPS are actually a relevant part of the system, many use cases of HDFS are for a *throughput oriented* workflow. So, in the traditional M/R use cases for HDFS, you likely will barely scratch the IOPS the system provides. In fact, HDFS in 0.20 will create a separate TCP connection for each IOPS - that should tell you how low random-access workflows ranked on the HDFS designs. As a disclaimer, there are use cases (particularly HBase, and how I currently use our HDFS install!) where IOPS are quite relevant. Just recall that they are not the end-all, be-all for HDFS performance measurement. It's not the primary number I would look for! Each install will have their own requirements. Brian On Oct 23, 2012, at 6:01 PM, Rita <rmorgan...@gmail.com> wrote: > I was curious because when a vendor (big storage company) presented they > were offering a hadoop solution. They posted IOPS and I wasn't sure how > they were determining this number.... > > > > On Tue, Oct 23, 2012 at 9:19 AM, Michael Segel > <michael_se...@hotmail.com>wrote: > >> You have two issues. >> >> 1) You need to know the throughput in terms of data transfer between disks >> and controller cards on the node. >> >> 2) The actual network throughput of having all of the nodes talking to one >> another as fast as they can. This will let you see your real limitations in >> the ToR Switch's fabric. >> >> Not sure why you really want to do this except to test the disk, disk >> controller, and then networking infrastructure of your ToR and then your >> backplane to connect multiple racks.... >> >> >> HTH >> >> -Mike >> >> On Oct 23, 2012, at 7:47 AM, Ravi Prakash <ravi...@ymail.com> wrote: >> >>> Do you mean in a cluster being used by users, or as a benchmark to >> measure the maximum? >>> >>> The JMX page <nn:port>/jmx provides some interesting stats, but I'm not >> sure they have what you want. And I'm unaware of other tools which could. >>> >>> >>> >>> >>> >>> ________________________________ >>> From: Rita <rmorgan...@gmail.com> >>> To: common-user@hadoop.apache.org; Ravi Prakash <ravi...@ymail.com> >>> Sent: Monday, October 22, 2012 6:46 PM >>> Subject: Re: measuring iops >>> >>> Is it possible to know how many reads and writes are occurring thru the >>> entire cluster in a consolidated manner -- this does not include >>> replication factors. >>> >>> >>> On Mon, Oct 22, 2012 at 10:28 AM, Ravi Prakash <ravi...@ymail.com> >> wrote: >>> >>>> Hi Rita, >>>> >>>> SliveTest can help you measure the number of reads / writes / deletes / >> ls >>>> / appends per second your NameNode can handle. >>>> >>>> DFSIO can be used to help you measure the amount of throughput. >>>> >>>> Both these tests are actually very flexible and have a plethora of >> options >>>> to help you test different facets of performance. In my experience, you >>>> actually have to be very careful and understand what the tests are doing >>>> for the results to be sensible. >>>> >>>> HTH >>>> Ravi >>>> >>>> >>>> >>>> >>>> ________________________________ >>>> From: Rita <rmorgan...@gmail.com> >>>> To: "<common-user@hadoop.apache.org>" <common-user@hadoop.apache.org> >>>> Sent: Monday, October 22, 2012 7:23 AM >>>> Subject: Re: measuring iops >>>> >>>> Anyone? >>>> >>>> >>>> On Sun, Oct 21, 2012 at 8:30 AM, Rita <rmorgan...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> Was curious if there was a method to measure the total number of IOPS >>>> (I/O >>>>> operations per second) on a HDFS cluster. >>>>> >>>>> >>>>> >>>>> -- >>>>> --- Get your facts first, then you can distort them as you please.-- >>>>> >>>> >>>> >>>> >>>> -- >>>> --- Get your facts first, then you can distort them as you please.-- >>>> >>> >>> >>> >>> -- >>> --- Get your facts first, then you can distort them as you please.-- >> >> > > > -- > --- Get your facts first, then you can distort them as you please.--
smime.p7s
Description: S/MIME cryptographic signature