Hi Richard,
For IB traffic, you can use 'collectl -sx'
http://collectl.sourceforge.net/Infiniband.html
or else mmpmon (which is what 'dstat --gpfs' uses underneath anyway)
If your other NSDs are full, then of course all writes will go to the
empty NSDs. And then reading those new files your performance will be
limited to just the new NSDs.
Regards,
Alex
On 12/09/2013 01:05 PM, Richard Lefebvre wrote:
Hi Alex,
I should have mention that my GPFS network is done through
infiniband/RDMA, so looking at the TCP probably won't work. I will try
to see if the traffic can be seen through ib0 (instead of eth0), but I
have my doubts.
As for the placement. The file system was 95% full when I added the new
NSDs. I know that what is waiting now from the waiters commands is the
to the 2 NSDs:
waiting 0.791707000 seconds, NSDThread: for I/O completion on disk d9
I have added more NSDs since then but the waiting is still on the 2
disks. None of the others.
Richard
On 12/09/2013 02:52 PM, Alex Chekholko wrote:
Hi Richard,
I would just use something like 'iftop' to look at the traffic between
the nodes. Or 'collectl'. Or 'dstat'.
e.g. dstat -N eth0 --gpfs --gpfs-ops --top-cpu-adv --top-io 2 10
http://dag.wiee.rs/home-made/dstat/
For the NSD balance question, since GPFS stripes the blocks evenly
across all the NSDs, they will end up balanced over time. Or you can
rebalance manually with 'mmrestripefs -b' or similar.
It is unlikely that particular files ended up on a single NSD, unless
the other NSDs are totally full.
Regards,
Alex
On 12/06/2013 04:31 PM, Richard Lefebvre wrote:
Hi,
I'm looking for a way to see which node (or nodes) is having an impact
on the gpfs server nodes which is slowing the whole file system? What
happens, usually, is a user is doing some I/O that doesn't fit the
configuration of the gpfs file system and the way it was explain on how
to use it efficiently. It is usually by doing a lot of unbuffered byte
size, very random I/O on the file system that was made for large files
and large block size.
My problem is finding out who is doing that. I haven't found a way to
pinpoint the node or nodes that could be the source of the problem, with
over 600 client nodes.
I tried to use "mmlsnodes -N waiters -L" but there is too much waiting
that I cannot pinpoint on something.
I must be missing something simple. Anyone got any help?
Note: there is another thing I'm trying to pinpoint. A temporary
imbalance was created by adding a new NSD. It seems that a group of
files have been created on that same NSD and a user keeps hitting that
NSD causing a high load. I'm trying to pinpoint the origin of that too.
At least until everything is balance back. But will balancing spread
those files since they are already on the most empty NSD?
Richard
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
--
Alex Chekholko [email protected] 347-401-4860
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss