On Mon, Mar 01, 2010 at 02:35:18PM -0500, Oleg Drokin wrote: > Hello! > > On Feb 28, 2010, at 9:31 PM, huangql wrote: > > We got a problem that the MDS has high load value and the system CPU is up > > to 60% when running chown command on client. It's strange that the load > > value and system CPU didn't decrease to the normal level as long as it > > getted high. Even we can't do anything on clients and OSS. You can see the > > information with top command as follows: > > How many files did that chown command affected (was it a chown -R for some > huge directory tree?). > Essentially chown (setattr) works in two steps, first it changes MDS > attributes then it queues an async RPC for > every file object to update the attributes on OST. If there are many files > that are getting updated this way, > there would be a lot of such messages queued and all the messages are sent at > once with no rate limiting. > Thisis consistent with what you are seeing here, ptlrpcd is busy > sending/receiving RPCs (ptlrpcd is lustre > thread that handles async RPCs sending/completion) and individual socklnd > threads are also busy processing
For small messages, the socklnd can't do zero-copy sends, so there's an additional cost for copying the small messages into socket send buffers, which adds to CPU usage. A show_cpu/show_processes from SysRQ should tell what the processes are being busy with.. > network transfers (also I think the code in lnet is not tuned to process huge > amounts of outstanding RPCs > which leads to additional CPU overhead in that case). Yes: https://bugzilla.lustre.org/show_bug.cgi?id=21619 Isaac _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
