Don't ask me how but it out of the blue resolved itself. I have 0 idea what went wrong...
On Dec 13, 2007, at 3:12 PM, Aaron Knister wrote: > Thanks for your help! I have some more information from the lctl dk-- > > 10000000:01000000:3:1197576228.177725:0:8816:0:(mgc_request.c: > 1130:mgc_process_log()) Can't get cfg lock: -108 > 10000000:01000000:1:1197576228.177727:0:8511:0:(mgc_request.c: > 558:mgc_blocking_ast()) Lock res 0x61746164 (data) > 00000100:00020000:3:1197576228.177728:0:8816:0:(client.c: > 710:ptlrpc_import_delay_req()) @@@ IMP_INVALID [EMAIL PROTECTED] > x390/t0 o501->[EMAIL PROTECTED]@o2ib_0:26/25 lens 200/304 e 0 to 11 > dl 0 ref 1 fl Rpc:/8/0 rc 0/0 > 10000000:01000000:1:1197576228.177729:0:8511:0:(mgc_request.c: > 583:mgc_blocking_ast()) log data-OST0000: original grant failed, will > requeue later > 10000000:01000000:3:1197576228.177731:0:8816:0:(mgc_request.c: > 1182:mgc_process_log()) [EMAIL PROTECTED]: configuration from log > 'data-OST0000' failed (-108). > 00000100:00080000:1:1197576236.900462:0:8444:0:(pinger.c: > 143:ptlrpc_pinger_main()) not pinging MGS (in recovery: FULL or > recovery disabled: 0/1) > > This is on the OSS. > > Also on the OSS -- > > 00010000:00000400:2:1197576684.886679:0:8597:0:(ldlm_lib.c: > 515:target_handle_reconnect()) data-OST0005: 532a7ed7-8e93-e086-885a- > b064e46adb12 > reconnecting00010000:00000400:2:1197576684.886683:0:8597:0: > (ldlm_lib.c: > 744:target_handle_connect()) data-OST0005: refuse reconnection from [EMAIL > PROTECTED] > @o2ib to 0xffff8103cc9e3000; st > ill busy with 9 active > RPCs00000100:00100000:1:1197576684.886683:0:8599:0:(service.c: > 1032:ptlrpc_server_handle_request()) Handling RPC pname:cluuid > +ref:pid:xid:nid:opc ll_ost_55:532a7ed7-8e93-e086-885a- > b064e46adb12+6:3962:x868:12345-192 > [EMAIL PROTECTED]:40000000010:00000002:1:1197576684.886687:0:8599:0: > (ost_handler.c:1598:ost_handle()) @@@ ping [EMAIL PROTECTED] x868/ > t0 o400->532a7ed7-8e93-e086-885a- > [EMAIL PROTECTED]: > 0/0 lens 128/0 e 0 to > 0 dl 1197576784 ref 1 fl Interpret:/0/0 rc > 0/000010000:00020000:2:1197576684.886688:0:8597:0:(ldlm_lib.c: > 1458:target_send_reply_msg()) @@@ processing error (-16) > [EMAIL PROTECTED] x871/t0 o8->532a7ed7-8e93-e086-885a- > [EMAIL PROTECTED] > ID:0/0 lens 304/200 e 0 to 0 dl 1197576784 ref 1 fl Interpret:/0/0 rc > -16/0 > > On the client it shows -- > > 00000100:00080000:0:1197576416.143577:0:3964:0:(recover.c: > 54:ptlrpc_initiate_recovery()) data-OST0004_UUID: starting recovery > 00000100:00080000:0:1197576416.143585:0:3964:0:(import.c: > 381:ptlrpc_connect_import()) ffff81082f49a000 data-OST0004_UUID: > changing import state from DISCONN to CONNECTING > 00000100:00080000:0:1197576416.143590:0:3964:0:(import.c: > 275:import_select_connection()) data-OST0004-osc-ffff81082ae12400: > connect to NID [EMAIL PROTECTED] last attempt 4296998987 > 00000100:00080000:0:1197576416.143597:0:3964:0:(import.c: > 339:import_select_connection()) data-OST0004-osc-ffff81082ae12400: > import ffff81082f49a000 using connection [EMAIL PROTECTED]/ > [EMAIL PROTECTED] > 00000100:02020000:0:1197576416.143864:0:3963:0:(client.c: > 581:ptlrpc_check_status()) 11-0: an error occurred while communicating > with [EMAIL PROTECTED] The ost_connect operation failed with -16 > 00000100:00080000:0:1197576416.144314:0:3963:0:(import.c: > 759:ptlrpc_connect_interpret()) ffff81082f49a000 data-OST0004_UUID: > changing import state from CONNECTING to DISCONN > 00000100:00080000:0:1197576416.144316:0:3963:0:(import.c: > 801:ptlrpc_connect_interpret()) recovery of data-OST0004_UUID on > [EMAIL PROTECTED] failed (-16) > > I'm at a loss. > > On Dec 13, 2007, at 11:59 AM, Oleg Drokin wrote: > >> Hello! >> >> On Dec 13, 2007, at 11:48 AM, Aaron Knister wrote: >> >>> On the client i see this -- >> >> This shows no activity aside from the fact that client is >> disconnected from OST5. >> >>> and on the server -- >> >> This one shows that served does not allow client reconnection >> because it is still >> busy processing other requests from this client. That's the reason >> for "mount hang". >> >> This is all I can tell from those logs you provided. If the logs >> actually span >> long in the past, might be there is more useful info. >> Since there was disconnection - perhaps dmesg on client and server >> contain >> more info about the disconnection reasons, also on server if you do >> sysrq-t, you will see what is going on with those server threads >> that are supposedly >> still process client requests. >> >> Bye, >> Oleg > > Aaron Knister > Associate Systems Administrator/Web Designer > Center for Research on Environment and Water > > (301) 595-7001 > [EMAIL PROTECTED] > > > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss Aaron Knister Associate Systems Administrator/Web Designer Center for Research on Environment and Water (301) 595-7001 [EMAIL PROTECTED] _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
