Hi,

Since a few hours we have a problem with one of our OSTs:

One (and only one) ll_ost_create_ process on one of the OSTs
seems to go crazy and uses 100% CPU.

Rebooting the OST + MDS didn't help and there isn't much
going on on the filesystem itself:

 - /proc/fs/lustre/ost/OSS/ost_create/stats is almost 'static'
 - iostat shows almost no usage
 - ib traffic is < 100 kb/s


The MDS logs this each ~3 minutes:
 Aug 13 19:11:14 mds1 kernel: LustreError: 11-0: an error occurred while 
communicating with 10.201.62...@o2ib. The ost_connect operation failed with -16
..and later:
 Aug 13 19:17:16 mds1 kernel: LustreError: 
10253:0:(osc_create.c:390:osc_create()) lustre1-OST0005-osc: oscc recovery 
failed: -110
 Aug 13 19:17:16 mds1 kernel: LustreError: 
10253:0:(lov_obd.c:1129:lov_clear_orphans()) error in orphan recovery on OST 
idx 5/32: rc = -110
 Aug 13 19:17:16 mds1 kernel: LustreError: 
10253:0:(mds_lov.c:1022:__mds_lov_synchronize()) lustre1-OST0005_UUID failed at 
mds_lov_clear_orphans: -110
 Aug 13 19:17:16 mds1 kernel: LustreError: 
10253:0:(mds_lov.c:1031:__mds_lov_synchronize()) lustre1-OST0005_UUID sync 
failed -110, deactivating
 Aug 13 19:17:54 mds1 kernel: Lustre: 
6544:0:(import.c:508:import_select_connection()) lustre1-OST0005-osc: tried all 
connections, increasing latency to 51s

oops! (lustre1-OST0005 is hosted on the OSS with the crazy ll_ost_create 
process)

On the affected OSS we get
 Lustre: 11764:0:(ldlm_lib.c:835:target_handle_connect()) lustre1-OST0005: 
refuse reconnection from [email protected]@o2ib to 
0xffff8102164d0200; still busy with 2 active RPCs


$ llog_reader lustre-log.1281718692.11833 shows:
Bit 0 of 284875 not set
Bit -32510 of 284875 not set
Bit -32510 of 284875 not set
Bit -32511 of 284875 not set
Bit 0 of 284875 not set
Bit -1 of 284875 not set
Bit 0 of 284875 not set
Bit -32510 of 284875 not set
Bit -32510 of 284875 not set
Bit -32510 of 284875 not set
Bit -1 of 284875 not set
Bit 0 of 284875 not set
Segmentation fault <-- *ouch*


And we get tons of soft-cpu lockups :-/

Any ideas?


Regards,
 Adrian


_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to