Can you please provide some details? What OS? What kernel version? Did you patch the kernel or are you using the RPMS?
On Fri, Feb 6, 2009 at 9:12 PM, Mahmoud Hanafi <[email protected]> wrote: > We had an mds crash and a subsequent reboot results in a panic. Any help > would be greatly appreciated. > > This error appears to be the key event. > > Feb 6 13:51:58 service100 kernel: LustreError: > 6976:0:(llog_obd.c:211:llog_add()) No ctxt > > Thank, > Mahmoud Hanafi > > Feb 6 13:39:14 service100 kernel: Lustre: m45_nb1-MDT0000: recovery > complete: rc 0 > Feb 6 13:39:15 service100 kernel: LustreError: > 6597:0:(llog_obd.c:211:llog_add()) No ctxt > Feb 6 13:39:15 service100 kernel: LustreError: > 6597:0:(llog_obd.c:211:llog_add()) Skipped 909 previous similar messages > Feb 6 13:39:15 service100 kernel: Lustre: MDS m45_nb1-MDT0000: > m45_nb1-OST0000_UUID now active, resetting orphans > Feb 6 13:39:15 service100 kernel: Lustre: MDS m45_nb1-MDT0000: > m45_nb1-OST0001_UUID now active, resetting orphans > Feb 6 13:39:15 service100 kernel: LustreError: > 6496:0:(llog_lvfs.c:612:llog_lvfs_create()) error looking up logfile > 0x11b80054:0x10703925: rc -2 > Feb 6 13:39:15 service100 kernel: LustreError: > 6496:0:(llog_cat.c:176:llog_cat_id2handle()) error opening log id > 0x11b80054:10703925: rc -2 > Feb 6 13:39:15 service100 kernel: LustreError: > 6496:0:(llog_cat.c:330:llog_cat_cancel_records()) Cannot find log 0x11b80054 > Feb 6 13:39:15 service100 kernel: LustreError: > 6497:0:(llog_lvfs.c:612:llog_lvfs_create()) error looking up logfile > 0x11b8004f:0x10703922: rc -2 > Feb 6 13:39:15 service100 kernel: LustreError: > 6497:0:(llog_cat.c:176:llog_cat_id2handle()) error opening log id > 0x11b8004f:10703922: rc -2 > Feb 6 13:39:15 service100 kernel: LustreError: > 6497:0:(llog_cat.c:330:llog_cat_cancel_records()) Cannot find log 0x11b8004f > Feb 6 13:39:15 service100 kernel: LustreError: > 6497:0:(llog_server.c:447:llog_origin_handle_cancel()) cancel 124 > llog-records failed: -22 > Feb 6 13:39:15 service100 kernel: LustreError: > 6496:0:(llog_server.c:447:llog_origin_handle_cancel()) cancel 124 > llog-records failed: -22 > Feb 6 13:39:15 service100 kernel: Lustre: MDS m45_nb1-MDT0000: > m45_nb1-OST0007_UUID now active, resetting orphans > Feb 6 13:39:15 service100 kernel: Lustre: Skipped 5 previous similar > messages > Feb 6 13:39:16 service100 kernel: LustreError: > 6497:0:(llog_lvfs.c:612:llog_lvfs_create()) error looking up logfile > 0x11b80052:0x1070392a: rc -2 > Feb 6 13:39:16 service100 kernel: LustreError: > 6497:0:(llog_lvfs.c:612:llog_lvfs_create()) Skipped 6 previous similar > messages > Feb 6 13:39:16 service100 kernel: LustreError: > 6497:0:(llog_cat.c:176:llog_cat_id2handle()) error opening log id > 0x11b80052:1070392a: rc -2 > Feb 6 13:39:16 service100 kernel: LustreError: > 6497:0:(llog_cat.c:176:llog_cat_id2handle()) Skipped 6 previous similar > messages > Feb 6 13:39:16 service100 kernel: LustreError: > 6497:0:(llog_cat.c:330:llog_cat_cancel_records()) Cannot find log 0x11b80052 > Feb 6 13:39:16 service100 kernel: LustreError: > 6497:0:(llog_cat.c:330:llog_cat_cancel_records()) Skipped 6 previous similar > messages > Feb 6 13:39:16 service100 kernel: LustreError: > 6499:0:(llog_server.c:447:llog_origin_handle_cancel()) cancel 124 > llog-records failed: -22 > > > Feb 6 13:51:51 service100 kernel: LDISKFS-fs warning: maximal mount count > reached, running e2fsck is recommended > Feb 6 13:51:51 service100 kernel: LDISKFS FS on sde1, internal journal > Feb 6 13:51:51 service100 kernel: LDISKFS-fs: recovery complete. > Feb 6 13:51:51 service100 kernel: LDISKFS-fs: mounted filesystem with > ordered data mode. > Feb 6 13:51:51 service100 kernel: kjournald starting. Commit interval 5 > seconds > Feb 6 13:51:51 service100 kernel: LDISKFS-fs warning: maximal mount count > reached, running e2fsck is recommended > Feb 6 13:51:51 service100 kernel: LDISKFS FS on sde1, internal journal > Feb 6 13:51:51 service100 kernel: LDISKFS-fs: mounted filesystem with > ordered data mode. > Feb 6 13:51:51 service100 kernel: Lustre: Added LNI 10.151.25....@o2ib > [8/64] > Feb 6 13:51:51 service100 kernel: LustreError: 137-5: UUID 'MGS' is not > available for connect (not set up) > Feb 6 13:51:51 service100 kernel: LustreError: > 6798:0:(mgs_handler.c:647:mgs_handle()) MGS handle cmd=250 rc=-19 > Feb 6 13:51:51 service100 kernel: LustreError: > 6798:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-19) > r...@ffff8107fb3c3050 x4876961/t0 o250-><?>@<?>:0/0 lens 304/0 e 0 to 0 dl > 1233957211 ref 1 fl Interpret:/0/0 rc -19/0 > Feb 6 13:51:51 service100 kernel: Lustre: MGS MGS started > Feb 6 13:51:51 service100 kernel: Lustre: Server MGS on device /dev/sde1 > has started > Feb 6 13:51:56 service100 kernel: (fs/jbd/recovery.c, 255): > journal_recover: JBD: recovery, exit status 0, recovered transactions > 2765219 to 2765244 > Feb 6 13:51:56 service100 kernel: (fs/jbd/recovery.c, 257): > journal_recover: JBD: Replayed 17611 and revoked 0/15 blocks > Feb 6 13:51:56 service100 kernel: kjournald starting. Commit interval 5 > seconds > Feb 6 13:51:57 service100 kernel: LDISKFS FS on sde2, internal journal > Feb 6 13:51:57 service100 kernel: LDISKFS-fs: recovery complete. > Feb 6 13:51:57 service100 kernel: LDISKFS-fs: mounted filesystem with > ordered data mode. > Feb 6 13:51:57 service100 kernel: kjournald starting. Commit interval 5 > seconds > Feb 6 13:51:57 service100 kernel: LDISKFS FS on sde2, internal journal > Feb 6 13:51:57 service100 kernel: LDISKFS-fs: mounted filesystem with > ordered data mode. > Feb 6 13:51:57 service100 kernel: LustreError: 137-5: UUID > 'm45_nb1-MDT0000_UUID' is not available for connect (no target) > Feb 6 13:51:57 service100 kernel: LustreError: > 6854:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-19) > r...@ffff8107db2d5400 x6027981/t0 o38-><?>@<?>:0/0 lens 304/0 e 0 to 0 dl > 1233957217 ref 1 fl Interpret:/0/0 rc -19/0 > Feb 6 13:51:58 service100 kernel: Lustre: Enabling user_xattr > Feb 6 13:51:58 service100 kernel: Lustre: Enabling ACL > Feb 6 13:51:58 service100 kernel: Lustre: > 6923:0:(mds_fs.c:493:mds_init_server_data()) RECOVERY: service > m45_nb1-MDT0000, 5893 recoverable clients, last_transno 5429096891 > Feb 6 13:51:58 service100 kernel: Lustre: > 6923:0:(mds_lov.c:1070:mds_notify()) MDS m45_nb1-MDT0000: in recovery, not > resetting orphans on m45_nb1-OST0000_UUID > Feb 6 13:51:58 service100 kernel: LustreError: > 6923:0:(obd_class.h:339:obd_get_info()) obd_get_info: NULL export > Feb 6 13:51:58 service100 kernel: LustreError: > 6923:0:(lov_obd.c:455:lov_connect()) m45_nb1-mdtlov error sending notify -19 > Feb 6 13:51:58 service100 kernel: Lustre: > 6923:0:(mds_lov.c:1070:mds_notify()) MDS m45_nb1-MDT0000: in recovery, not > resetting orphans on m45_nb1-OST0003_UUID > Feb 6 13:51:58 service100 kernel: Lustre: > 6923:0:(mds_lov.c:1070:mds_notify()) Skipped 2 previous similar messages > Feb 6 13:51:58 service100 kernel: LustreError: > 6923:0:(obd_class.h:339:obd_get_info()) obd_get_info: NULL export > Feb 6 13:51:58 service100 kernel: LustreError: > 6923:0:(obd_class.h:339:obd_get_info()) Skipped 2 previous similar messages > Feb 6 13:51:58 service100 kernel: LustreError: > 6923:0:(lov_obd.c:455:lov_connect()) m45_nb1-mdtlov error sending notify -19 > Feb 6 13:51:58 service100 kernel: LustreError: > 6923:0:(lov_obd.c:455:lov_connect()) Skipped 2 previous similar messages > Feb 6 13:51:58 service100 kernel: Lustre: MDT m45_nb1-MDT0000 now serving > dev (m45_nb1-MDT0000/c528a9db-4b84-a59c-41b6-ad3a6ec11fbf), but will be in > recovery for at least 5:00, or until 5893 clients reconnect. During this > time new clients will not be allowed to connect. Recovery progress can be > monitored by watching /proc/fs/lustre/mds/m45_nb1-MDT0000/recovery_status. > Feb 6 13:51:58 service100 kernel: Lustre: > 6923:0:(lproc_mds.c:273:lprocfs_wr_group_upcall()) m45_nb1-MDT0000: group > upcall set to NONE > Feb 6 13:51:58 service100 kernel: Lustre: m45_nb1-MDT0000.mdt: set > parameter group_upcall=NONE > Feb 6 13:51:58 service100 kernel: Lustre: m45_nb1-MDT0000: temporarily > refusing client connection from 10.151.9....@o2ib > Feb 6 13:51:58 service100 kernel: Lustre: m45_nb1-MDT0000: temporarily > refusing client connection from 10.151.6....@o2ib > Feb 6 13:51:58 service100 kernel: Lustre: m45_nb1-MDT0000.mdt: set > parameter quota_type=u2 > Feb 6 13:51:58 service100 kernel: Lustre: > 6861:0:(ldlm_lib.c:1226:check_and_start_recovery_timer()) m45_nb1-MDT0000: > starting recovery timer > Feb 6 13:51:58 service100 kernel: Lustre: > 6882:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: > 5892 recoverable clients remain > Feb 6 13:51:58 service100 kernel: Lustre: > 6868:0:(mds_open.c:835:mds_open_by_fid()) Orphan 53f26ea:0f8c9a49 found and > opened in PENDING directory > Feb 6 13:51:58 service100 kernel: Lustre: > 6870:0:(mds_open.c:835:mds_open_by_fid()) Orphan 5482886:0fa65d97 found and > opened in PENDING directory > Feb 6 13:51:58 service100 kernel: Lustre: > 6869:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: > 5891 recoverable clients remain > Feb 6 13:51:58 service100 kernel: Lustre: > 7002:0:(mds_open.c:835:mds_open_by_fid()) Orphan 54820e2:0fa5e637 found and > opened in PENDING directory > Feb 6 13:51:58 service100 kernel: Lustre: > 7002:0:(mds_open.c:835:mds_open_by_fid()) Skipped 137 previous similar > messages > Feb 6 13:51:58 service100 kernel: LustreError: > 6976:0:(llog_obd.c:211:llog_add()) No ctxt > Feb 6 13:51:58 service100 kernel: Lustre: > 6885:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: > 5861 recoverable clients remain > Feb 6 13:51:58 service100 kernel: Lustre: > 6885:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) Skipped 29 > previous similar messages > Feb 6 13:51:59 service100 kernel: Lustre: > 6875:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: > 5565 recoverable clients remain > Feb 6 13:51:59 service100 kernel: Lustre: > 6875:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) Skipped 295 > previous similar messages > Feb 6 13:51:59 service100 kernel: Lustre: > 6974:0:(mds_open.c:835:mds_open_by_fid()) Orphan 530890c:0fad2c72 found and > opened in PENDING directory > Feb 6 13:51:59 service100 kernel: Lustre: > 6974:0:(mds_open.c:835:mds_open_by_fid()) Skipped 713 previous similar > messages > Feb 6 13:52:01 service100 kernel: Lustre: > 6881:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: > 4755 recoverable clients remain > Feb 6 13:52:01 service100 kernel: Lustre: > 6881:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) Skipped 809 > previous similar messages > Feb 6 13:52:01 service100 kernel: Lustre: > 6866:0:(mds_open.c:835:mds_open_by_fid()) Orphan 54c028f:0fad585d found and > opened in PENDING directory > Feb 6 13:52:01 service100 kernel: Lustre: > 6866:0:(mds_open.c:835:mds_open_by_fid()) Skipped 1691 previous similar > messages > Feb 6 13:52:05 service100 kernel: Lustre: > 6865:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: > 3930 recoverable clients remain > Feb 6 13:52:05 service100 kernel: Lustre: > 6865:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) Skipped 824 > previous similar messages > Feb 6 13:52:05 service100 kernel: Lustre: > 6968:0:(mds_open.c:835:mds_open_by_fid()) Orphan 54cd214:0fabedc6 found and > opened in PENDING directory > Feb 6 13:52:05 service100 kernel: Lustre: > 6968:0:(mds_open.c:835:mds_open_by_fid()) Skipped 2113 previous similar > messages > Feb 6 13:52:13 service100 kernel: Lustre: > 6879:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: > 2153 recoverable clients remain > Feb 6 13:52:13 service100 kernel: Lustre: > 6879:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) Skipped 1775 > previous similar messages > Feb 6 13:52:13 service100 kernel: Lustre: > 6872:0:(mds_open.c:835:mds_open_by_fid()) Orphan 52f9aea:0f799bf6 found and > opened in PENDING directory > Feb 6 13:52:13 service100 kernel: Lustre: > 6872:0:(mds_open.c:835:mds_open_by_fid()) Skipped 3299 previous similar > messages > Feb 6 13:53:31 service100 kernel: Lustre: > 7002:0:(mds_open.c:835:mds_open_by_fid()) Orphan 52f9af7:0f7b104d found and > opened in PENDING directory > Feb 6 13:53:31 service100 kernel: Lustre: > 7002:0:(mds_open.c:835:mds_open_by_fid()) Skipped 1232 previous similar > messages > Feb 6 13:53:31 service100 kernel: Lustre: > 6983:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: > 1295 recoverable clients remain > Feb 6 13:53:31 service100 kernel: Lustre: > 6983:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) Skipped 856 > previous similar messages > Feb 6 13:53:38 service100 kernel: Lustre: > 7006:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: > 9236b2bf-92ee-fc8b-c7f2-e3563a377de0 reconnecting > Feb 6 13:53:38 service100 kernel: Lustre: > 7006:0:(ldlm_lib.c:773:target_handle_connect()) m45_nb1-MDT0000: refuse > reconnection from [email protected]@o2ib to > 0xffff8107cd932000; still busy with 2 active RPCs > Feb 6 13:53:38 service100 kernel: LustreError: > 7006:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-16) > r...@ffff810725625800 x5375830/t0 > o38->9236b2bf-92ee-fc8b-c7f2-e3563a377...@net_0x500000a9751d8_uuid:0/0 lens > 304/200 e 0 to 0 dl 1233957318 ref 1 fl Interpret:/0/0 rc -16/0 > Feb 6 13:53:38 service100 kernel: LustreError: > 7006:0:(ldlm_lib.c:1619:target_send_reply_msg()) Skipped 38 previous similar > messages > Feb 6 13:53:38 service100 kernel: Lustre: > 6971:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: > a432bfb8-6afb-cc67-d49e-8e1ba23de270 reconnecting > Feb 6 13:53:38 service100 kernel: LustreError: > 6982:0:(ldlm_lib.c:1434:target_queue_recovery_request()) @@@ dropping resent > queued req r...@ffff81072508f400 x5066410/t0 > o101->a432bfb8-6afb-cc67-d49e-8e1ba23de...@net_0x500000a97482f_uuid:0/0 lens > 512/0 e 0 to 0 dl 1233957318 ref 1 fl Interpret:/6/0 rc 0/0 > Feb 6 13:53:39 service100 kernel: Lustre: > 6986:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: > bacd25ef-2f62-e88e-b080-d129171a0666 reconnecting > Feb 6 13:53:39 service100 kernel: LustreError: > 7008:0:(ldlm_lib.c:1434:target_queue_recovery_request()) @@@ dropping resent > queued req r...@ffff8107257f1000 x32603191/t0 > o36->bacd25ef-2f62-e88e-b080-d129171a0...@net_0x500000a97055a_uuid:0/0 lens > 336/0 e 0 to 0 dl 1233957319 ref 1 fl Interpret:/6/0 rc 0/0 > Feb 6 13:53:41 service100 kernel: Lustre: > 6861:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: > 9e70be4b-f534-5f36-39ca-cbd3f398981f reconnecting > Feb 6 13:53:41 service100 kernel: LustreError: > 6919:0:(ldlm_lib.c:1434:target_queue_recovery_request()) @@@ dropping resent > queued req r...@ffff81072562ca00 x5621783/t0 > o35->9e70be4b-f534-5f36-39ca-cbd3f3989...@net_0x500000a97131b_uuid:0/0 lens > 296/0 e 0 to 0 dl 1233957321 ref 1 fl Interpret:/6/0 rc 0/0 > Feb 6 13:53:43 service100 kernel: Lustre: > 6869:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: > 668fb888-f573-8a5d-656d-f0f6943b261d reconnecting > Feb 6 13:53:43 service100 kernel: LustreError: > 6994:0:(ldlm_lib.c:1434:target_queue_recovery_request()) @@@ dropping resent > queued req r...@ffff810725028600 x5087532/t0 > o36->668fb888-f573-8a5d-656d-f0f6943b2...@net_0x500000a970477_uuid:0/0 lens > 360/0 e 0 to 0 dl 1233957323 ref 1 fl Interpret:/6/0 rc 0/0 > Feb 6 13:53:48 service100 kernel: Lustre: > 6881:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: > 37479c6c-952d-1e5b-f28b-08a886b21994 reconnecting > Feb 6 13:53:48 service100 kernel: LustreError: > 6918:0:(ldlm_lib.c:1434:target_queue_recovery_request()) @@@ dropping resent > queued req r...@ffff81072504ca00 x2467781/t0 > o35->37479c6c-952d-1e5b-f28b-08a886b21...@net_0x500000a970bba_uuid:0/0 lens > 296/0 e 0 to 0 dl 1233957328 ref 1 fl Interpret:/6/0 rc 0/0 > Feb 6 13:53:53 service100 kernel: LustreError: > 6859:0:(ldlm_lib.c:1434:target_queue_recovery_request()) @@@ dropping resent > queued req r...@ffff81072504ca00 x5025647/t0 > o101->d5ff86c7-b54a-57cf-1948-928fac > Feb 6 13:54:02 service100 kernel: LustreError: > 6965:0:(llog_obd.c:211:llog_add()) No ctxt > Feb 6 13:54:28 service100 kernel: Lustre: > 6870:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: > 9236b2bf-92ee-fc8b-c7f2-e3563a377de0 reconnecting > Feb 6 13:54:28 service100 kernel: Lustre: > 6870:0:(ldlm_lib.c:538:target_handle_reconnect()) Skipped 2 previous similar > messages > Feb 6 13:54:28 service100 kernel: Lustre: > 6870:0:(ldlm_lib.c:773:target_handle_connect()) m45_nb1-MDT0000: refuse > reconnection from [email protected]@o2ib to > 0xffff8107cd932000; still busy with 2 active RPCs > Feb 6 13:54:28 service100 kernel: LustreError: > 6870:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-16) > r...@ffff810724675a00 x5375927/t0 > o38->9236b2bf-92ee-fc8b-c7f2-e3563a377...@net_0x500000a9751d8_uuid:0/0 lens > 304/200 e 0 to 0 dl 1233957368 ref 1 fl Interpret:/0/0 rc -16/0 > Feb 6 13:54:53 service100 kernel: Lustre: > 6994:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: > 9236b2bf-92ee-fc8b-c7f2-e3563a377de0 reconnecting > Feb 6 13:54:53 service100 kernel: Lustre: > 6994:0:(ldlm_lib.c:773:target_handle_connect()) m45_nb1-MDT0000: refuse > reconnection from [email protected]@o2ib to > 0xffff8107cd932000; still busy with 2 active RPCs > Feb 6 13:54:53 service100 kernel: LustreError: > 6994:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-16) > r...@ffff81072458fe00 x5376006/t0 > o38->9236b2bf-92ee-fc8b-c7f2-e3563a377...@net_0x500000a9751d8_uuid:0/0 lens > 304/200 e 0 to 0 dl 1233957393 ref 1 fl Interpret:/0/0 rc -16/0 > Feb 6 13:55:18 service100 kernel: Lustre: > 6968:0:(ldlm_lib.c:773:target_handle_connect()) m45_nb1-MDT0000: refuse > reconnection from [email protected]@o2ib to > 0xffff8107cd932000; still busy with 2 active RPCs > Feb 6 13:55:18 service100 kernel: LustreError: > 6968:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-16) > r...@ffff8107247fee00 x5376085/t0 > o38->9236b2bf-92ee-fc8b-c7f2-e3563a377...@net_0x500000a9751d8_uuid:0/0 lens > 304/200 e 0 to 0 dl 1233957418 ref 1 fl Interpret:/0/0 rc -16/0 > Feb 6 13:55:18 service100 kernel: Lustre: 0:0:(watchdog.c:148:lcw_cb()) > Watchdog triggered for pid 6965: it was inactive for 200s > Feb 6 13:55:18 service100 kernel: Lustre: > 0:0:(linux-debug.c:185:libcfs_debug_dumpstack()) showing stack for process > 6965 > Feb 6 13:55:18 service100 kernel: ll_mdt_33 S ffffffffffffffff 0 > 6965 1 6966 6964 (L-TLB) > Feb 6 13:55:18 service100 kernel: ffff8107cabf3b28 0000000000000046 > 0000000000001705 000000000000000a > Feb 6 13:55:18 service100 kernel: ffff8108134f8a48 ffff8108134f87f0 > ffff810009059800 0000005b41b8f6c4 > Feb 6 13:55:18 service100 kernel: 0000000000001735 0000000300000000 > Feb 6 13:55:18 service100 kernel: Call Trace: > <ffffffff885fe428>{:ptlrpc:target_queue_recovery_request+2792} > Feb 6 13:55:18 service100 kernel: > <ffffffff8012c8a9>{default_wake_function+0} > <ffffffff8873ad91>{:mds:mds_handle+2273} > Feb 6 13:55:18 service100 kernel: > <ffffffff8833aa71>{:lnet:lnet_match_blocked_msg+961} > Feb 6 13:55:18 service100 kernel: > <ffffffff80305642>{thread_return+0} > <ffffffff88393995>{:obdclass:class_handle2object+213} > Feb 6 13:55:18 service100 kernel: > <ffffffff8862e765>{:ptlrpc:lustre_msg_get_conn_cnt+53} > Feb 6 13:55:18 service100 kernel: > <ffffffff8012bac9>{find_busiest_group+360} > <ffffffff8863860a>{:ptlrpc:ptlrpc_check_req+26} > Feb 6 13:55:18 service100 kernel: > <ffffffff8863a867>{:ptlrpc:ptlrpc_server_handle_request+2503} > Feb 6 13:55:18 service100 kernel: > <ffffffff8010f239>{do_gettimeofday+92} > <ffffffff882fa3d6>{:libcfs:lcw_update_time+38} > Feb 6 13:55:19 service100 kernel: > <ffffffff8013d49d>{__mod_timer+173} > <ffffffff8863d9d1>{:ptlrpc:ptlrpc_main+3745} > Feb 6 13:55:19 service100 kernel: > <ffffffff8012c8a9>{default_wake_function+0} <ffffffff8010bfc2>{child_rip+8} > Feb 6 13:55:19 service100 kernel: > <ffffffff8863cb30>{:ptlrpc:ptlrpc_main+0} <ffffffff8010bfba>{child_rip+0} > Feb 6 13:55:19 service100 kernel: LustreError: dumping log to > /tmp/lustre-log.1233957318.6965 > Feb 6 13:55:21 service100 kernel: LustreError: > 6919:0:(ldlm_lib.c:1434:target_queue_recovery_request()) @@@ dropping resent > queued req r...@ffff810724035200 x5621783/t0 > o35->9e70be4b-f534-5f36-39ca-cbd3f3989...@net_0x500000a97131b_uuid:0/0 lens > 296/0 e 0 to 0 dl 1233957421 ref 1 fl Interpret:/6/0 rc 0/0 > Feb 6 13:55:21 service100 kernel: LustreError: > 6919:0:(ldlm_lib.c:1434:target_queue_recovery_request()) Skipped 1 previous > similar message > Feb 6 13:55:43 service100 kernel: Lustre: > 6861:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: > 9236b2bf-92ee-fc8b-c7f2-e3563a377de0 reconnecting > Feb 6 13:55:43 service100 kernel: Lustre: > 6861:0:(ldlm_lib.c:538:target_handle_reconnect()) Skipped 2 previous similar > messages > Feb 6 13:55:43 service100 kernel: Lustre: > 6861:0:(ldlm_lib.c:773:target_handle_connect()) m45_nb1-MDT0000: refuse > reconnection from [email protected]@o2ib to > 0xffff8107cd932000; still busy with 2 active RPCs > Feb 6 13:55:43 service100 kernel: LustreError: > 6861:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-16) > r...@ffff8107246d9600 x5376164/t0 > o38->9236b2bf-92ee-fc8b-c7f2-e3563a377...@net_0x500000a9751d8_uuid:0/0 lens > 304/200 e 0 to 0 dl 1233957443 ref 1 fl Interpret:/0/0 rc -16/0 > Feb 6 13:56:08 service100 kernel: Lustre: > 7009:0:(ldlm_lib.c:773:target_handle_connect()) m45_nb1-MDT0000: refuse > reconnection from [email protected]@o2ib to > 0xffff8107cd932000; still busy with 2 active RPCs > Feb 6 13:56:08 service100 kernel: LustreError: > 7009:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-16) > r...@ffff8107247a0400 x5376243/t0 > o38->9236b2bf-92ee-fc8b-c7f2-e3563a377...@net_0x500000a9751d8_uuid:0/0 lens > 304/200 e 0 to 0 dl 1233957468 ref 1 fl Interpret:/0/0 rc -16/0 > Feb 6 13:56:33 service100 kernel: Lustre: > 6855:0:(ldlm_lib.c:773:target_handle_connect()) m45_nb1-MDT0000: refuse > reconnection from [email protected]@o2ib to > 0xffff8107cd932000; still busy with 2 active RPCs > Feb 6 13:56:58 service100 kernel: Lustre: > 7008:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: > 6c3e80bd-92fb-8a7c-5bd9-72bc744956fc reconnecting > Feb 6 13:56:58 service100 kernel: Lustre: > 7008:0:(ldlm_lib.c:538:target_handle_reconnect()) Skipped 2 previous similar > messages > Feb 6 13:56:58 service100 kernel: Lustre: > 6973:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: 3 > recoverable clients remain > Feb 6 13:56:58 service100 kernel: Lustre: > 6973:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) Skipped 1292 > previous similar messages > Feb 6 13:56:58 service100 kernel: Lustre: Parent 87005581/3805388507 lookup > error -2. Evicting client 7a197206-3055-fbec-480a-93bdd6753834 with export > 10.151.77....@o2ib. > Feb 6 13:56:58 service100 kernel: LustreError: > 6983:0:(handler.c:1590:mds_handle()) operation 101 on unconnected MDS from > 12345-10.151.77....@o2ib > Feb 6 13:56:58 service100 kernel: LustreError: > 6983:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error > (-107) r...@ffff8107243ece00 x5257230/t0 o101-><?>@<?>:0/0 lens 232/0 e 0 to > 0 dl 1233957518 ref 1 fl Interpret:/0/0 rc -107/0 > Feb 6 13:56:58 service100 kernel: LustreError: > 6983:0:(ldlm_lib.c:1619:target_send_reply_msg()) Skipped 1 previous similar > message > > > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
