Hi,

We got a problem that the MDS has high load value and the system CPU is up to 
60% when running chown command on client. It's strange that the load value and 
system CPU didn't decrease to the normal level as long as it getted high. Even 
we can't do anything on clients and OSS. You can see the information with top 
command as follows:
[r...@mainmds ~]# top
top - 10:19:02 up  1:03,  3 users,  load average: 28.73, 27.10, 23.88
Tasks: 515 total,  44 running, 471 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0%us, 84.1%sy,  0.0%ni, 15.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us, 72.5%sy,  0.0%ni, 27.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us, 83.5%sy,  0.0%ni, 16.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us, 78.4%sy,  0.0%ni, 21.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.0%us, 82.9%sy,  0.0%ni, 17.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us, 69.2%sy,  0.0%ni, 30.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.0%us, 79.6%sy,  0.0%ni, 20.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  :  0.0%us, 77.2%sy,  0.0%ni, 22.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu9  :  0.0%us, 58.9%sy,  0.0%ni, 41.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 :  0.0%us, 84.4%sy,  0.0%ni, 15.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu11 :  0.0%us, 97.6%sy,  0.0%ni,  2.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 :  0.0%us, 81.4%sy,  0.0%ni, 18.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu13 :  0.0%us, 85.0%sy,  0.0%ni, 15.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 :  0.0%us, 88.0%sy,  0.0%ni, 12.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu15 :  0.0%us, 36.3%sy,  0.0%ni, 63.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  24682716k total,  2985412k used, 21697304k free,   268360k buffers
Swap: 24579440k total,        0k used, 24579440k free,   368904k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
                                       
 5449 root      16   0     0    0    0 R 100.2  0.0  52:46.12 ptlrpcd           
                                       
 5434 root      16   0     0    0    0 R 89.0  0.0  34:15.77 socknal_sd07       
                                       
 5432 root      16   0     0    0    0 R 88.3  0.0  32:43.12 socknal_sd05       
                                       
 5430 root      16   0     0    0    0 R 79.1  0.0  30:37.78 socknal_sd03       
                                       
 5436 root      16   0     0    0    0 R 61.2  0.0  29:08.47 socknal_sd09       
                                       
 5440 root      16   0     0    0    0 S 59.5  0.0  33:31.32 socknal_sd13       
                                       
 5433 root      16   0     0    0    0 R 49.0  0.0  23:20.61 socknal_sd06       
                                       
 5431 root      15   0     0    0    0 R 45.0  0.0  26:04.43 socknal_sd04       
                                       
 5427 root      15   0     0    0    0 S 44.7  0.0  23:31.11 socknal_sd00       
                                       
 5435 root      15   0     0    0    0 S 44.3  0.0  24:50.30 socknal_sd08       
                                       
 5439 root      15   0     0    0    0 R 43.7  0.0  24:23.79 socknal_sd12       
                                       
 5437 root      15   0     0    0    0 R 39.7  0.0  27:11.58 socknal_sd10       
                                       
 5438 root      16   0     0    0    0 S 37.4  0.0  40:50.69 socknal_sd11       
                                       
 5441 root      15   0     0    0    0 S 35.4  0.0  26:35.59 socknal_sd14      

According to the top information, we can see the proc ptlrpcd with 100% CPU, it 
is not normal for the system, it likes the ptlrpcd become locked. So we have to 
reboot the MDS to solve the proble now. We don't know about the phenomena. Do 
someone get the problem or have some idea for it? I will be appreciate for your 
any help.
Addition, we use the lustre 1.8.1.1 on MDS and OSS, lustre1.6.5 on clients. 

Thanks advance for you.

Cheers
Qiulan Huang
--------------------------------------------------------------   
Computing Center IHEP         Office: Computing Center,123 
19B Yuquan Road                 Tel: (+86) 10 88236012-607
P.O. Box 918-7                    Fax: (+86) 10 8823 6839
Beijing 100049,China             Email: [email protected] 
--------------------------------------------------------------    
2010-03-01 



huangql 
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to