Re: [gpfsug-discuss] gpfs performance monitoring

Salvatore Di Nardo Thu, 04 Sep 2014 03:45:13 -0700


On 04/09/14 01:50, Sven Oehme wrote:

> Hello everybody,

Hi

> here i come here again, this time to ask some hint about how tomonitor GPFS.

>
> I know about mmpmon, but the issue with its "fs_io_s" and "io_s" is
> that they return number based only on the request done in the
> current host, so i have to run them on all the clients ( over 600
> nodes) so its quite unpractical.  Instead i would like to know from
> the servers whats going on, and i came across the vio_s statistics
> wich are less documented and i dont know exacly what they mean.
> There is also this script "/usr/lpp/mmfs/samples/vdisk/viostat" that
> runs VIO_S.
>
> My problems with the output of this command:
>  echo "vio_s" | /usr/lpp/mmfs/bin/mmpmon -r 1
>
> mmpmon> mmpmon node 10.7.28.2 name gss01a vio_s OK VIOPS per second
> timestamp: 1409763206/477366
> recovery group: *
> declustered array: *
> vdisk: *
> client reads: 2584229
> client short writes: 55299693
> client medium writes: 190071
> client promoted full track writes:      465145
> client full track writes: 9249
> flushed update writes: 4187708
> flushed promoted full track writes: 123
> migrate operations: 114
> scrub operations: 450590
> log writes: 28509602
>
> it sais "VIOPS per second", but they seem to me just counters as
> every time i re-run the command, the numbers increase by a bit..
> Can anyone confirm if those numbers are counter or if they are OPS/sec.

the numbers are accumulative so everytime you run them they just showthe value since start (or last reset) time.

OK, you confirmed my toughts, thatks

>
> On a closer eye about i dont understand what most of thosevalues
> mean. For example, what exacly are "flushed promoted full trackwrite" ??
> I tried to find a documentation about this output , but could not
> find any. can anyone point me a link where output of vio_s is explained?
>
> Another thing i dont understand about those numbers is if they are
> just operations, or the number of blocks that was read/write/etc .
its just operations and if i would explain what the numbers mean imight confuse you even more because this is not what you are reallylooking for.what you are looking for is what the client io's look like on theServer side, while the VIO layer is the Server side to the disks, soone lever lower than what you are looking for from what i could readout of the description above.

No.. what I'm looking its exactly how the disks are busy to keep therequests. Obviously i'm not looking just that, but I feel the needs tomonitor _*also*_ those things. Ill explain you why.

It happens when our storage is quite busy ( 180Gb/s of read/write ) thatthe FS start to be slowin normal /*cd*/ or /*ls*/ requests. This mightbe normal, but in those situation i want to know where the bottleneckis. Is the server CPU? Memory? Network? Spindles? knowing where thebottlenek is might help me to understand if we can tweak the system abit more.

If its the CPU on the servers then there is no much to do besidereplacing or add more servers.If its not the CPU, maybe more memorywould help? Maybe its just the network that filled up? so i can add morelinks

Or if we reached the point there the bottleneck its the spindles, thenthere is no much point o look somethere else, we just reached thehardware limit..

Sometimes, it also happens that there is very low IO (10Gb/s ), almostno cpu usage on the servers but huge slownes ( ls can take 10 seconds).Why that happens? There is not much data ops , but we think there is ahuge ammount of metadata ops. So what i want to know is if the metadatavdisks are busy or not. If this is our problem, could some SSD disksdedicated to metadata help?



In particular im, a bit puzzled with the design of our GSS storage.

Each recovery groups have 3 declustered arrays, and each declusteredaray have 1 data and 1 metadata vdisk, but in the end both metadata anddata vdisks use the same spindles. The problem that, its that I dontunderstand if we have a metadata bottleneck there. Maybe some SSD disksin a dedicated declustered array would perform much better, but this isjust theory. I really would like to be able to monitor IO activities onthe metadata vdisks.

so the Layer you care about is the NSD Server layer, which sits on topof the VIO layer (which is essentially the SW RAID Layer in GNR)
> I'm asking that because if they are just ops, i don't know how much
> they could be usefull. For example one write operation could eman
> write 1 block or write a file of 100GB. If those are oprations,
> there is a way to have the oupunt in bytes or blocks?
there are multiple ways to get infos on the NSD layer, one would be touse the dstat plugin (see /usr/lpp/mmfs/sample/util) but thats countsagain.

Counters its not a problem. I can collect them and create some graphs ina monitoring tool. I will check that.

the alternative option is to use mmdiag --iohist. this shows you ahistory of the last X numbers of io operations on either the client orthe server side like on a client :
# mmdiag --iohist

=== mmdiag: iohist ===

I/O history:
I/O start time RW Buf type disk:sectorNum nSec time ms qTimems RpcTimes ms Type Device/NSD ID NSD server--------------- -- ----------- ----------------- ----- --------------- ----------------- ---- ------------------ ---------------14:25:22.169617 R LLIndBlock 1:1075622848 64 13.0730.000 12.959 0.063 cli C0A70401:53BEEA7F 192.167.4.114:25:22.182723 R inode 1:1071252480 8 6.970 0.0006.908 0.038 cli C0A70401:53BEEA7F 192.167.4.114:25:53.659918 R LLIndBlock 1:1081202176 64 8.3090.000 8.210 0.046 cli C0A70401:53BEEA7F 192.167.4.114:25:53.668262 R inode 2:1081373696 8 14.1170.000 14.032 0.058 cli C0A70402:53BEEA5E 192.167.4.214:25:53.682750 R LLIndBlock 1:1065508736 64 9.2540.000 9.180 0.038 cli C0A70401:53BEEA7F 192.167.4.114:25:53.692019 R inode 2:1064356608 8 14.8990.000 14.847 0.029 cli C0A70402:53BEEA5E 192.167.4.214:25:53.707100 R inode 2:1077830152 8 16.4990.000 16.449 0.025 cli C0A70402:53BEEA5E 192.167.4.214:25:53.723788 R LLIndBlock 1:1081202432 64 4.2800.000 4.203 0.040 cli C0A70401:53BEEA7F 192.167.4.114:25:53.728082 R inode 2:1081918976 8 7.760 0.0007.710 0.027 cli C0A70402:53BEEA5E 192.167.4.214:25:57.877416 R metadata 2:678978560 16 13.343 0.00013.254 0.053 cli C0A70402:53BEEA5E 192.167.4.214:25:57.891048 R LLIndBlock 1:1065508608 64 15.4910.000 15.401 0.058 cli C0A70401:53BEEA7F 192.167.4.114:25:57.906556 R inode 2:1083476520 8 11.7230.000 11.676 0.029 cli C0A70402:53BEEA5E 192.167.4.214:25:57.918516 R LLIndBlock 1:1075622720 64 8.0620.000 8.001 0.032 cli C0A70401:53BEEA7F 192.167.4.114:25:57.926592 R inode 1:1076503480 8 8.087 0.0008.043 0.026 cli C0A70401:53BEEA7F 192.167.4.114:25:57.934856 R LLIndBlock 1:1071088512 64 6.5720.000 6.510 0.033 cli C0A70401:53BEEA7F 192.167.4.114:25:57.941441 R inode 2:1069885984 8 11.6860.000 11.641 0.024 cli C0A70402:53BEEA5E 192.167.4.214:25:57.953294 R inode 2:1083476936 8 8.951 0.0008.912 0.021 cli C0A70402:53BEEA5E 192.167.4.214:25:57.965475 R inode 1:1076503504 8 0.477 0.0000.053 0.000 cli C0A70401:53BEEA7F 192.167.4.114:25:57.965755 R inode 2:1083476488 8 0.410 0.0000.061 0.321 cli C0A70402:53BEEA5E 192.167.4.214:25:57.965787 R inode 2:1083476512 8 0.439 0.0000.053 0.342 cli C0A70402:53BEEA5E 192.167.4.2
you basically see if its a inode , data block , what size it has (insectors) , which nsd server you did send this request to, etc.
on the Server side you see the type , which physical disk it goes toand also what size of disk i/o it causes like :
14:26:50.129995 R inode 12:3211886376 64 14.2610.000 0.000 0.000 pd sdis14:26:50.137102 R inode 19:3003969520 64 9.0040.000 0.000 0.000 pd sdad14:26:50.136116 R inode 55:3591710992 64 11.0570.000 0.000 0.000 pd sdoh14:26:50.141510 R inode 21:3066810504 64 5.9090.000 0.000 0.000 pd sdaf14:26:50.130529 R inode 89:2962370072 64 17.4370.000 0.000 0.000 pd sddi14:26:50.131063 R inode 78:1889457000 64 17.0620.000 0.000 0.000 pd sdsj14:26:50.143403 R inode 36:3323035688 64 4.8070.000 0.000 0.000 pd sdmw14:26:50.131044 R inode 37:2513579736 128 17.1810.000 0.000 0.000 pd sddv14:26:50.138181 R inode 72:3868810400 64 10.9510.000 0.000 0.000 pd sdbz14:26:50.138188 R inode 131:2443484784 128 11.7920.000 0.000 0.000 pd sdug14:26:50.138003 R inode 102:3696843872 64 11.9940.000 0.000 0.000 pd sdgp14:26:50.137099 R inode 145:3370922504 64 13.2250.000 0.000 0.000 pd sdmi14:26:50.141576 R inode 62:2668579904 64 9.3130.000 0.000 0.000 pd sdou14:26:50.134689 R inode 159:2786164648 64 16.5770.000 0.000 0.000 pd sdpq14:26:50.145034 R inode 34:2097217320 64 7.4090.000 0.000 0.000 pd sdmt14:26:50.138140 R inode 139:2831038792 64 14.8980.000 0.000 0.000 pd sdlw14:26:50.130954 R inode 164:282120312 64 22.2740.000 0.000 0.000 pd sdzd14:26:50.137038 R inode 41:3421909608 64 16.3140.000 0.000 0.000 pd sdef14:26:50.137606 R inode 104:1870962416 64 16.6440.000 0.000 0.000 pd sdgx14:26:50.141306 R inode 65:2276184264 64 16.5930.000 0.000 0.000 pd sdrk

mmdiag --iohist its another think i looked at it, but i could not findgood explanation for all the "buf type" ( third column )


           allocSeg
           data
           iallocSeg
           indBlock
           inode
           LLIndBlock
           logData
           logDesc
           logWrap
           metadata
           vdiskAULog
           vdiskBuf
           vdiskFWLog
           vdiskMDLog
           vdiskMeta
           vdiskRGDesc

If i want to monifor metadata operation whan should i look at? just themetadata flag or also inode? this command takes also long to run,especially if i run it a second time it hangs for a lot before to rerunagain, so i'm not sure that run it every 30secs or minute its viable,but i will look also into that. THere is any documentation that descibesclearly the whole output? what i found its quite generic and don't gointo details...

>
> Last but not least.. and this is what i really would like to
> accomplish, i would to be able to monitor the latency of metadataoperations.
you can't do this on the server side as you don't know how much timeyou spend on the client , network or anything between the app and thephysical disk, so you can only reliably look at this from the client,the iohist output only shows you the Server disk i/o processing time,but that can be a fraction of the overall time (in other cases thisobviously can also be the dominant part depending on your workload).
the easiest way on the client is to run

mmfsadm vfsstats enable
from now on vfs stats are collected until you restart GPFS.

then run :

vfs statistics currently enabled
started at: Fri Aug 29 13:15:05.380 2014
  duration: 448446.970 sec

 name        calls  time per call     total time
 -------------------- -------- -------------- --------------
 statfs          9       0.000002     0.000021
 startIO  246191176       0.005853 1441049.976740

to dump what ever you collected so far on this node.

We already do that, but as I said, I want to check specifically how gssservers are keeping the requests to identify or exlude server sidebottlenecks.



Thanks for your help, you gave me definitely few things where to look at.

Salvatore

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] gpfs performance monitoring

Reply via email to