Hi Alex, I should have mention that my GPFS network is done through infiniband/RDMA, so looking at the TCP probably won't work. I will try to see if the traffic can be seen through ib0 (instead of eth0), but I have my doubts.
As for the placement. The file system was 95% full when I added the new NSDs. I know that what is waiting now from the waiters commands is the to the 2 NSDs: waiting 0.791707000 seconds, NSDThread: for I/O completion on disk d9 I have added more NSDs since then but the waiting is still on the 2 disks. None of the others. Richard On 12/09/2013 02:52 PM, Alex Chekholko wrote: > Hi Richard, > > I would just use something like 'iftop' to look at the traffic between > the nodes. Or 'collectl'. Or 'dstat'. > > e.g. dstat -N eth0 --gpfs --gpfs-ops --top-cpu-adv --top-io 2 10 > http://dag.wiee.rs/home-made/dstat/ > > For the NSD balance question, since GPFS stripes the blocks evenly > across all the NSDs, they will end up balanced over time. Or you can > rebalance manually with 'mmrestripefs -b' or similar. > > It is unlikely that particular files ended up on a single NSD, unless > the other NSDs are totally full. > > Regards, > Alex > > On 12/06/2013 04:31 PM, Richard Lefebvre wrote: >> Hi, >> >> I'm looking for a way to see which node (or nodes) is having an impact >> on the gpfs server nodes which is slowing the whole file system? What >> happens, usually, is a user is doing some I/O that doesn't fit the >> configuration of the gpfs file system and the way it was explain on how >> to use it efficiently. It is usually by doing a lot of unbuffered byte >> size, very random I/O on the file system that was made for large files >> and large block size. >> >> My problem is finding out who is doing that. I haven't found a way to >> pinpoint the node or nodes that could be the source of the problem, with >> over 600 client nodes. >> >> I tried to use "mmlsnodes -N waiters -L" but there is too much waiting >> that I cannot pinpoint on something. >> >> I must be missing something simple. Anyone got any help? >> >> Note: there is another thing I'm trying to pinpoint. A temporary >> imbalance was created by adding a new NSD. It seems that a group of >> files have been created on that same NSD and a user keeps hitting that >> NSD causing a high load. I'm trying to pinpoint the origin of that too. >> At least until everything is balance back. But will balancing spread >> those files since they are already on the most empty NSD? >> >> Richard >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
