Just kidding...I spoke WAY too soon. It's acting up again. On Dec 13, 2007, at 6:51 PM, Aaron Knister wrote:
> Don't ask me how but it out of the blue resolved itself. I have 0 > idea what went wrong... > > On Dec 13, 2007, at 3:12 PM, Aaron Knister wrote: > >> Thanks for your help! I have some more information from the lctl dk-- >> >> 10000000:01000000:3:1197576228.177725:0:8816:0:(mgc_request.c: >> 1130:mgc_process_log()) Can't get cfg lock: -108 >> 10000000:01000000:1:1197576228.177727:0:8511:0:(mgc_request.c: >> 558:mgc_blocking_ast()) Lock res 0x61746164 (data) >> 00000100:00020000:3:1197576228.177728:0:8816:0:(client.c: >> 710:ptlrpc_import_delay_req()) @@@ IMP_INVALID [EMAIL PROTECTED] >> x390/t0 o501->[EMAIL PROTECTED]@o2ib_0:26/25 lens 200/304 e 0 to >> 11 >> dl 0 ref 1 fl Rpc:/8/0 rc 0/0 >> 10000000:01000000:1:1197576228.177729:0:8511:0:(mgc_request.c: >> 583:mgc_blocking_ast()) log data-OST0000: original grant failed, will >> requeue later >> 10000000:01000000:3:1197576228.177731:0:8816:0:(mgc_request.c: >> 1182:mgc_process_log()) [EMAIL PROTECTED]: configuration from log >> 'data-OST0000' failed (-108). >> 00000100:00080000:1:1197576236.900462:0:8444:0:(pinger.c: >> 143:ptlrpc_pinger_main()) not pinging MGS (in recovery: FULL or >> recovery disabled: 0/1) >> >> This is on the OSS. >> >> Also on the OSS -- >> >> 00010000:00000400:2:1197576684.886679:0:8597:0:(ldlm_lib.c: >> 515:target_handle_reconnect()) data-OST0005: 532a7ed7-8e93-e086-885a- >> b064e46adb12 >> reconnecting00010000:00000400:2:1197576684.886683:0:8597:0: >> (ldlm_lib.c: >> 744:target_handle_connect()) data-OST0005: refuse reconnection from [EMAIL >> PROTECTED] >> @o2ib to 0xffff8103cc9e3000; st >> ill busy with 9 active >> RPCs00000100:00100000:1:1197576684.886683:0:8599:0:(service.c: >> 1032:ptlrpc_server_handle_request()) Handling RPC pname:cluuid >> +ref:pid:xid:nid:opc ll_ost_55:532a7ed7-8e93-e086-885a- >> b064e46adb12+6:3962:x868:12345-192 >> [EMAIL PROTECTED]:40000000010:00000002:1:1197576684.886687:0:8599:0: >> (ost_handler.c:1598:ost_handle()) @@@ ping [EMAIL PROTECTED] >> x868/ >> t0 o400->532a7ed7-8e93-e086-885a- >> [EMAIL PROTECTED]: >> 0/0 lens 128/0 e 0 to >> 0 dl 1197576784 ref 1 fl Interpret:/0/0 rc >> 0/000010000:00020000:2:1197576684.886688:0:8597:0:(ldlm_lib.c: >> 1458:target_send_reply_msg()) @@@ processing error (-16) >> [EMAIL PROTECTED] x871/t0 o8->532a7ed7-8e93-e086-885a- >> [EMAIL PROTECTED] >> ID:0/0 lens 304/200 e 0 to 0 dl 1197576784 ref 1 fl Interpret:/0/0 rc >> -16/0 >> >> On the client it shows -- >> >> 00000100:00080000:0:1197576416.143577:0:3964:0:(recover.c: >> 54:ptlrpc_initiate_recovery()) data-OST0004_UUID: starting recovery >> 00000100:00080000:0:1197576416.143585:0:3964:0:(import.c: >> 381:ptlrpc_connect_import()) ffff81082f49a000 data-OST0004_UUID: >> changing import state from DISCONN to CONNECTING >> 00000100:00080000:0:1197576416.143590:0:3964:0:(import.c: >> 275:import_select_connection()) data-OST0004-osc-ffff81082ae12400: >> connect to NID [EMAIL PROTECTED] last attempt 4296998987 >> 00000100:00080000:0:1197576416.143597:0:3964:0:(import.c: >> 339:import_select_connection()) data-OST0004-osc-ffff81082ae12400: >> import ffff81082f49a000 using connection [EMAIL PROTECTED]/ >> [EMAIL PROTECTED] >> 00000100:02020000:0:1197576416.143864:0:3963:0:(client.c: >> 581:ptlrpc_check_status()) 11-0: an error occurred while >> communicating >> with [EMAIL PROTECTED] The ost_connect operation failed with -16 >> 00000100:00080000:0:1197576416.144314:0:3963:0:(import.c: >> 759:ptlrpc_connect_interpret()) ffff81082f49a000 data-OST0004_UUID: >> changing import state from CONNECTING to DISCONN >> 00000100:00080000:0:1197576416.144316:0:3963:0:(import.c: >> 801:ptlrpc_connect_interpret()) recovery of data-OST0004_UUID on >> [EMAIL PROTECTED] failed (-16) >> >> I'm at a loss. >> >> On Dec 13, 2007, at 11:59 AM, Oleg Drokin wrote: >> >>> Hello! >>> >>> On Dec 13, 2007, at 11:48 AM, Aaron Knister wrote: >>> >>>> On the client i see this -- >>> >>> This shows no activity aside from the fact that client is >>> disconnected from OST5. >>> >>>> and on the server -- >>> >>> This one shows that served does not allow client reconnection >>> because it is still >>> busy processing other requests from this client. That's the reason >>> for "mount hang". >>> >>> This is all I can tell from those logs you provided. If the logs >>> actually span >>> long in the past, might be there is more useful info. >>> Since there was disconnection - perhaps dmesg on client and server >>> contain >>> more info about the disconnection reasons, also on server if you do >>> sysrq-t, you will see what is going on with those server threads >>> that are supposedly >>> still process client requests. >>> >>> Bye, >>> Oleg >> >> Aaron Knister >> Associate Systems Administrator/Web Designer >> Center for Research on Environment and Water >> >> (301) 595-7001 >> [EMAIL PROTECTED] >> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > Aaron Knister > Associate Systems Administrator/Web Designer > Center for Research on Environment and Water > > (301) 595-7001 > [EMAIL PROTECTED] > > > Aaron Knister Associate Systems Administrator/Web Designer Center for Research on Environment and Water (301) 595-7001 [EMAIL PROTECTED] _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
