Hello!

   I see. Sounds like a bug then.
   We do not test more than 1 version back because this is what we guarantee to 
work.
   Still can you please file a bug in our bugzilla for this as it's the newer 
MDS that
   exhibits a problem due to client input?

   Thanks.

Bye,
    Oleg
On Mar 2, 2010, at 12:12 AM, huangql wrote:

>  
> Hi, Oleg
>  
> Thank you for your timely reply. We wait for this high cpu utilization for 
> one night before deciding to reboot as the load and System utilization didn't 
> decrease at all. For the second question, We really use chown -R for the huge 
> derectory trees with the space of 5.9TB where is about ten thousands of files.
> However, we tried to this on the 1.8.1.1 client, the system CPU for MDS also 
> got up to a level about 60%, then it decrease to a normal level after the 
> chown command finished and it finished in a expected time. According to it, 
> We think it's a confict between client 1.6.6 and server 1.8.1.1. Have you 
> ever try this? Now we are using the client 1.6.5 and client 1.8.1.1, because 
> we have the two version servers, We will upgrade all lustre client  to 
> 1.8.1.1 until we evacuate the 1.6.5 servers. 
>  
>  
>  
>  
> Cheers
> Qiulan Huang
> --------------------------------------------------------------   
> Computing Center IHEP         Office: Computing Center,123 
> 19B Yuquan Road                 Tel: (+86) 10 88236012-604
> P.O. Box 918-7                    Fax: (+86) 10 8823 6839
> Beijing 100049,China             Email: [email protected]
> --------------------------------------------------------------   
>  
> 2010-03-02
> huangql
> 发件人: Oleg Drokin
> 发送时间: 2010-03-02  03:31:34
> 收件人: huangql
> 抄送: lustre-discuss discuss; Maxim Patlasov
> 主题: Re: [Lustre-discuss] High Load and high system CPU for mds
> Hello!
> On Feb 28, 2010, at 9:31 PM, huangql wrote:
> > We got a problem that the MDS has high load value and the system CPU is up 
> > to 60% when running chown command on client. It's strange that the load 
> > value and system CPU didn't decrease to the normal level as long as it 
> > getted high. Even we can't do anything on clients and OSS. You can see the 
> > information with top command as follows:
> How many files did that chown command affected (was it a chown -R for some 
> huge directory tree?).
> Essentially chown (setattr) works in two steps, first it changes MDS 
> attributes then it queues an async RPC for
> every file object to update the attributes on OST. If there are many files 
> that are getting updated this way,
> there would be a lot of such messages queued and all the messages are sent at 
> once with no rate limiting.
> Thisis consistent with what you are seeing here, ptlrpcd is busy 
> sending/receiving RPCs (ptlrpcd is lustre
> thread that handles async RPCs sending/completion) and individual socklnd 
> threads are also busy processing
> network transfers (also I think the code in lnet is not tuned to process huge 
> amounts of outstanding RPCs
> which leads to additional CPU overhead in that case).
> So on the surface it looks like everything performs as expected, though 
> certainly lustre might have
> behaved better.
> How long did you wait with this high cpu utilization before deciding to 
> reboot and how many files
> were affected by the chown?
> Bye,
>     Oleg

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to