Hello!
I see. Sounds like a bug then.
We do not test more than 1 version back because this is what we guarantee to
work.
Still can you please file a bug in our bugzilla for this as it's the newer
MDS that
exhibits a problem due to client input?
Thanks.
Bye,
Oleg
On Mar 2, 2010, at 12:12 AM, huangql wrote:
>
> Hi, Oleg
>
> Thank you for your timely reply. We wait for this high cpu utilization for
> one night before deciding to reboot as the load and System utilization didn't
> decrease at all. For the second question, We really use chown -R for the huge
> derectory trees with the space of 5.9TB where is about ten thousands of files.
> However, we tried to this on the 1.8.1.1 client, the system CPU for MDS also
> got up to a level about 60%, then it decrease to a normal level after the
> chown command finished and it finished in a expected time. According to it,
> We think it's a confict between client 1.6.6 and server 1.8.1.1. Have you
> ever try this? Now we are using the client 1.6.5 and client 1.8.1.1, because
> we have the two version servers, We will upgrade all lustre client to
> 1.8.1.1 until we evacuate the 1.6.5 servers.
>
>
>
>
> Cheers
> Qiulan Huang
> --------------------------------------------------------------
> Computing Center IHEP Office: Computing Center,123
> 19B Yuquan Road Tel: (+86) 10 88236012-604
> P.O. Box 918-7 Fax: (+86) 10 8823 6839
> Beijing 100049,China Email: [email protected]
> --------------------------------------------------------------
>
> 2010-03-02
> huangql
> 发件人: Oleg Drokin
> 发送时间: 2010-03-02 03:31:34
> 收件人: huangql
> 抄送: lustre-discuss discuss; Maxim Patlasov
> 主题: Re: [Lustre-discuss] High Load and high system CPU for mds
> Hello!
> On Feb 28, 2010, at 9:31 PM, huangql wrote:
> > We got a problem that the MDS has high load value and the system CPU is up
> > to 60% when running chown command on client. It's strange that the load
> > value and system CPU didn't decrease to the normal level as long as it
> > getted high. Even we can't do anything on clients and OSS. You can see the
> > information with top command as follows:
> How many files did that chown command affected (was it a chown -R for some
> huge directory tree?).
> Essentially chown (setattr) works in two steps, first it changes MDS
> attributes then it queues an async RPC for
> every file object to update the attributes on OST. If there are many files
> that are getting updated this way,
> there would be a lot of such messages queued and all the messages are sent at
> once with no rate limiting.
> Thisis consistent with what you are seeing here, ptlrpcd is busy
> sending/receiving RPCs (ptlrpcd is lustre
> thread that handles async RPCs sending/completion) and individual socklnd
> threads are also busy processing
> network transfers (also I think the code in lnet is not tuned to process huge
> amounts of outstanding RPCs
> which leads to additional CPU overhead in that case).
> So on the surface it looks like everything performs as expected, though
> certainly lustre might have
> behaved better.
> How long did you wait with this high cpu utilization before deciding to
> reboot and how many files
> were affected by the chown?
> Bye,
> Oleg
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss