On Mon, Mar 01, 2010 at 02:35:18PM -0500, Oleg Drokin wrote:
> Hello!
> 
> On Feb 28, 2010, at 9:31 PM, huangql wrote:
> > We got a problem that the MDS has high load value and the system CPU is up 
> > to 60% when running chown command on client. It's strange that the load 
> > value and system CPU didn't decrease to the normal level as long as it 
> > getted high. Even we can't do anything on clients and OSS. You can see the 
> > information with top command as follows:
> 
> How many files did that chown command affected (was it a chown -R for some 
> huge directory tree?).
> Essentially chown (setattr) works in two steps, first it changes MDS 
> attributes then it queues an async RPC for
> every file object to update the attributes on OST. If there are many files 
> that are getting updated this way,
> there would be a lot of such messages queued and all the messages are sent at 
> once with no rate limiting.
> Thisis consistent with what you are seeing here, ptlrpcd is busy 
> sending/receiving RPCs (ptlrpcd is lustre
> thread that handles async RPCs sending/completion) and individual socklnd 
> threads are also busy processing

For small messages, the socklnd can't do zero-copy sends, so there's
an additional cost for copying the small messages into socket send
buffers, which adds to CPU usage.

A show_cpu/show_processes from SysRQ should tell what the processes
are being busy with..

> network transfers (also I think the code in lnet is not tuned to process huge 
> amounts of outstanding RPCs
> which leads to additional CPU overhead in that case).

Yes:
https://bugzilla.lustre.org/show_bug.cgi?id=21619

Isaac
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to