On Wed, Aug 6, 2008 at 11:26 AM, Brian J. Murrell <[EMAIL PROTECTED]> wrote: > > +rpctrace debug is probably the way to go to see what the client > are(n't) doing in terms of keeping the MDS aware of it's existence. >
The log from the client perspective starts with a lot of packet mismatches: 00000100:00020000:3:1217629906.902955:0:5592:0:(events.c:116:reply_in_callback()) early reply sized 176, expect 128 00000100:00020000:3:1217633409.730495:0:5590:0:(events.c:116:reply_in_callback()) early reply sized 240, expect 128 00000400:00000100:3:1217633608.884380:0:5588:0:(lib-move.c:1647:lnet_parse_put()) Dropping PUT from [EMAIL PROTECTED] portal 25 match 157556 offset 0 length 128 : 2 00000100:00020000:3:1217633710.886759:0:5587:0:(events.c:116:reply_in_callback()) early reply sized 176, expect 128 00000100:00020000:3:1217633852.887489:0:5586:0:(events.c:116:reply_in_callback()) early reply sized 176, expect 128 00000100:00020000:3:1217634099.512727:0:5591:0:(events.c:116:reply_in_callback()) early reply sized 240, expect 128 ...repeat many times. Then it repeats this many times: 00000100:00080000:6:1217651769.943608:0:5596:0:(recover.c:54:ptlrpc_initiate_recovery()) MGS: starting recovery 00000100:00080000:6:1217651769.943619:0:5596:0:(import.c:465:ptlrpc_connect_import()) ffff810275e06188 MGS: changing import state from DISCONN to CONNECTING 00000100:00080000:6:1217651769.943623:0:5596:0:(import.c:342:import_select_connection()) [EMAIL PROTECTED]: connect to NID [EMAIL PROTECTED] last attempt 4342590785 00000100:00080000:6:1217651769.943628:0:5596:0:(import.c:423:import_select_connection()) [EMAIL PROTECTED]: import ffff810275e06188 using connection MGC36.102.29.1 @o2ib_0/[EMAIL PROTECTED] 00000100:00080000:6:1217651769.943630:0:5596:0:(import.c:504:ptlrpc_connect_import()) Last reconnection attempt (743) for MGS 00000100:00080000:6:1217651820.942855:0:5596:0:(recover.c:54:ptlrpc_initiate_recovery()) MGS: starting recovery 00000100:00080000:6:1217651820.942858:0:5596:0:(import.c:465:ptlrpc_connect_import()) ffff810275e06188 MGS: changing import state from DISCONN to CONNECTING 00000100:00080000:6:1217651820.942862:0:5596:0:(import.c:342:import_select_connection()) [EMAIL PROTECTED]: connect to NID [EMAIL PROTECTED] last attempt 4342641786 00000100:00080000:6:1217651820.942866:0:5596:0:(import.c:423:import_select_connection()) [EMAIL PROTECTED]: import ffff810275e06188 using connection MGC36.102.29.1 @o2ib_0/[EMAIL PROTECTED] 00000100:00080000:6:1217651820.942868:0:5596:0:(import.c:504:ptlrpc_connect_import()) Last reconnection attempt (744) for MGS ... repeat... with one of these embedded: 00000100:00080000:6:1217651860.941624:0:5596:0:(pinger.c:144:ptlrpc_pinger_main()) not pinging MGS (in recovery: EVICTED or recovery disabled: 0/0) ...ending with this: 00000100:00080000:6:1217652002.937993:0:5596:0:(import.c:504:ptlrpc_connect_import()) Last reconnection attempt (748) for MGS 00000100:00080000:3:1217663095.721928:0:9878:0:(import.c:1016:ptlrpc_invalidate_import_thread()) thread invalidate import [EMAIL PROTECTED] to [EMAIL PROTECTED]@o2ib 10000000:01000000:3:1217663095.721931:0:9878:0:(mgc_request.c:880:mgc_import_event()) import event 0x808003 00000100:00080000:3:1217663095.722708:0:9878:0:(import.c:1025:ptlrpc_invalidate_import_thread()) ffff810275e06188 MGS: changing import state from EVICTED to RECOVER 00000100:00080000:3:1217663095.722710:0:9878:0:(import.c:1103:ptlrpc_import_recovery_state_machine()) reconnected to [EMAIL PROTECTED]@o2ib 00000100:00080000:3:1217663095.722712:0:9878:0:(import.c:1108:ptlrpc_import_recovery_state_machine()) ffff810275e06188 MGS: changing import state from RECOVER to FUL L 10000000:01000000:3:1217663095.722714:0:9878:0:(mgc_request.c:880:mgc_import_event()) import event 0x808004 10000000:02000400:3:1217663095.722715:0:9878:0:(mgc_request.c:894:mgc_import_event()) [EMAIL PROTECTED]: Reactivating import 00000100:02000000:3:1217663095.722719:0:9878:0:(import.c:1116:ptlrpc_import_recovery_state_machine()) [EMAIL PROTECTED]: Connection restored to service MGS using n id [EMAIL PROTECTED] ...then followed by length mismatches interspersed with "recovery": 00000100:00020000:3:1217663393.547988:0:5586:0:(events.c:116:reply_in_callback()) early reply sized 240, expect 128 00000100:00020000:3:1217663677.841799:0:5591:0:(events.c:116:reply_in_callback()) early reply sized 240, expect 128 00000100:00020000:3:1217663714.726333:0:5586:0:(events.c:116:reply_in_callback()) early reply sized 176, expect 128 00000100:00020000:3:1217663819.561596:0:5592:0:(events.c:116:reply_in_callback()) early reply sized 240, expect 128 00000100:00020000:3:1217664140.732610:0:5592:0:(events.c:116:reply_in_callback()) early reply sized 176, expect 128 00000100:00020000:3:1217664244.038963:0:5590:0:(events.c:116:reply_in_callback()) early reply sized 240, expect 128 00000100:00020000:3:1217664556.735280:0:5586:0:(events.c:116:reply_in_callback()) early reply sized 176, expect 128 00000100:00080000:3:1217665408.738550:0:9927:0:(import.c:1016:ptlrpc_invalidate_import_thread()) thread invalidate import [EMAIL PROTECTED] to [EMAIL PROTECTED]@o2ib 10000000:01000000:3:1217665408.738554:0:9927:0:(mgc_request.c:880:mgc_import_event()) import event 0x808003 00000100:00080000:3:1217665408.739328:0:9927:0:(import.c:1025:ptlrpc_invalidate_import_thread()) ffff810275e06188 MGS: changing import state from EVICTED to RECOVER 00000100:00080000:3:1217665408.739330:0:9927:0:(import.c:1103:ptlrpc_import_recovery_state_machine()) reconnected to [EMAIL PROTECTED]@o2ib 00000100:00080000:3:1217665408.739333:0:9927:0:(import.c:1108:ptlrpc_import_recovery_state_machine()) ffff810275e06188 MGS: changing import state from RECOVER to FUL L 10000000:01000000:3:1217665408.739335:0:9927:0:(mgc_request.c:880:mgc_import_event()) import event 0x808004 10000000:02000400:3:1217665408.739336:0:9927:0:(mgc_request.c:894:mgc_import_event()) [EMAIL PROTECTED]: Reactivating import 00000100:02000000:3:1217665408.739340:0:9927:0:(import.c:1116:ptlrpc_import_recovery_state_machine()) [EMAIL PROTECTED]: Connection restored to service MGS using n id [EMAIL PROTECTED] 00000100:00080000:3:1217665408.739344:0:9927:0:(recover.c:175:ptlrpc_wake_delayed()) @@@ waking (set 0000000000000000): [EMAIL PROTECTED] x328561/t0 o501->[EMAIL PROTECTED] [EMAIL PROTECTED]:26/25 lens 200/304 e 0 to 40 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 00000100:00020000:3:1217665653.806267:0:5589:0:(events.c:116:reply_in_callback()) early reply sized 240, expect 128 00000100:00020000:3:1217665692.740540:0:5589:0:(events.c:116:reply_in_callback()) early reply sized 176, expect 128 00000100:00020000:6:1217666544.737120:0:5592:0:(events.c:116:reply_in_callback()) early reply sized 176, expect 128 00000100:00020000:6:1217666647.982634:0:5590:0:(events.c:116:reply_in_callback()) early reply sized 240, expect 128 00000100:00020000:6:1217667112.732267:0:5588:0:(events.c:116:reply_in_callback()) early reply sized 176, expect 128 00000100:00080000:3:1217667254.732738:0:9966:0:(import.c:1016:ptlrpc_invalidate_import_thread()) thread invalidate import [EMAIL PROTECTED] to [EMAIL PROTECTED]@o2ib 10000000:01000000:3:1217667254.732742:0:9966:0:(mgc_request.c:880:mgc_import_event()) import event 0x808003 00000100:00080000:3:1217667254.733543:0:9966:0:(import.c:1025:ptlrpc_invalidate_import_thread()) ffff810275e06188 MGS: changing import state from EVICTED to RECOVER 00000100:00080000:3:1217667254.733546:0:9966:0:(import.c:1103:ptlrpc_import_recovery_state_machine()) reconnected to [EMAIL PROTECTED]@o2ib 00000100:00080000:3:1217667254.733549:0:9966:0:(import.c:1108:ptlrpc_import_recovery_state_machine()) ffff810275e06188 MGS: changing import state from RECOVER to FUL L 10000000:01000000:3:1217667254.733552:0:9966:0:(mgc_request.c:880:mgc_import_event()) import event 0x808004 10000000:02000400:3:1217667254.733553:0:9966:0:(mgc_request.c:894:mgc_import_event()) [EMAIL PROTECTED]: Reactivating import 00000100:02000000:3:1217667254.733556:0:9966:0:(import.c:1116:ptlrpc_import_recovery_state_machine()) [EMAIL PROTECTED]: Connection restored to service MGS using n id [EMAIL PROTECTED] 00000100:00080000:3:1217667254.733559:0:9966:0:(recover.c:175:ptlrpc_wake_delayed()) @@@ waking (set 0000000000000000): [EMAIL PROTECTED] x338536/t0 o501->[EMAIL PROTECTED] [EMAIL PROTECTED]:26/25 lens 200/304 e 0 to 40 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 00000100:00020000:6:1217667254.733672:0:5586:0:(events.c:116:reply_in_callback()) early reply sized 176, expect 128 00000100:00020000:6:1217667502.281243:0:5589:0:(events.c:116:reply_in_callback()) early reply sized 240, expect 128 00000100:00020000:3:1217668248.722633:0:5587:0:(events.c:116:reply_in_callback()) early reply sized 176, expect 128 ... lots of these mismatches, followed by what look like more recoveries: 00000100:02000000:0:1217678411.535026:0:12692:0:(import.c:1116:ptlrpc_import_recovery_state_machine()) [EMAIL PROTECTED]: Connection restored to service MGS using nid [EMAIL PROTECTED] 00000100:00020000:0:1217678425.863040:0:5590:0:(events.c:116:reply_in_callback()) early reply sized 240, expect 128 00000100:00080000:0:1217678462.534057:0:12693:0:(import.c:1016:ptlrpc_invalidate_import_thread()) thread invalidate import [EMAIL PROTECTED] to [EMAIL PROTECTED]@o2i b 10000000:01000000:0:1217678462.534061:0:12693:0:(mgc_request.c:880:mgc_import_event()) import event 0x808003 00000100:00080000:0:1217678462.534192:0:12693:0:(import.c:1025:ptlrpc_invalidate_import_thread()) ffff810275e06188 MGS: changing import state from EVICTED to RECOVER 00000100:00080000:0:1217678462.534194:0:12693:0:(import.c:1103:ptlrpc_import_recovery_state_machine()) reconnected to [EMAIL PROTECTED]@o2ib 00000100:00080000:0:1217678462.534195:0:12693:0:(import.c:1108:ptlrpc_import_recovery_state_machine()) ffff810275e06188 MGS: changing import state from RECOVER to FU LL 10000000:01000000:0:1217678462.534196:0:12693:0:(mgc_request.c:880:mgc_import_event()) import event 0x808004 10000000:02000400:0:1217678462.534197:0:12693:0:(mgc_request.c:894:mgc_import_event()) [EMAIL PROTECTED]: Reactivating import ...repeated many, many, many times... then: 00000100:00080000:4:1217780793.292959:0:16703:0:(import.c:1108:ptlrpc_import_recovery_state_machine()) ffff810275e06188 MGS: changing import state from RECOVER to FU LL 10000000:01000000:4:1217780793.292960:0:16703:0:(mgc_request.c:880:mgc_import_event()) import event 0x808004 10000000:02000400:4:1217780793.292961:0:16703:0:(mgc_request.c:894:mgc_import_event()) [EMAIL PROTECTED]: Reactivating import 00000100:02000000:4:1217780793.292962:0:16703:0:(import.c:1116:ptlrpc_import_recovery_state_machine()) [EMAIL PROTECTED]: Connection restored to service MGS using nid [EMAIL PROTECTED] 10000000:01000000:4:1217780806.980814:0:6127:0:(mgc_request.c:289:mgc_requeue_thread()) updating log lfs-client 10000000:01000000:4:1217780806.980817:0:6127:0:(mgc_request.c:1092:mgc_process_log()) Process log lfs-client:ffff81027cca8d20 from 530 10000000:01000000:4:1217780806.980819:0:6127:0:(mgc_request.c:627:mgc_enqueue()) Enqueue for lfs-client (res 0x73666c) 00000100:00020000:0:1217780806.980984:0:5588:0:(events.c:116:reply_in_callback()) early reply sized 240, expect 128 10000000:01000000:4:1217780844.291599:0:6127:0:(mgc_request.c:1111:mgc_process_log()) Can't get cfg lock: -5 00000100:00080000:0:1217780844.291621:0:5596:0:(recover.c:54:ptlrpc_initiate_recovery()) MGS: starting recovery 00000100:00080000:4:1217780844.291623:0:6127:0:(client.c:1801:ptlrpc_queue_wait()) @@@ "ll_cfg_requeue" waiting for recovery: (FULL != DISCONN) [EMAIL PROTECTED] x935044/t0 o501->[EMAIL PROTECTED]@o2ib:26/25 lens 200/304 e 0 to 40 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 00000100:00080000:0:1217780844.291625:0:5596:0:(import.c:465:ptlrpc_connect_import()) ffff810275e06188 MGS: changing import state from DISCONN to CONNECTING 00000100:00080000:0:1217780844.291630:0:5596:0:(import.c:342:import_select_connection()) [EMAIL PROTECTED]: connect to NID [EMAIL PROTECTED] last attempt 4471669871 10000000:01000000:4:1217780844.291636:0:5660:0:(mgc_request.c:565:mgc_blocking_ast()) Lock res 0x73666c (lfs) 00000100:00080000:0:1217780844.291637:0:5596:0:(import.c:423:import_select_connection()) [EMAIL PROTECTED]: import ffff810275e06188 using connection MGC36.102.29.1 @o2ib_0/[EMAIL PROTECTED] 10000000:01000000:4:1217780844.291638:0:5660:0:(mgc_request.c:590:mgc_blocking_ast()) log lfs-client: original grant failed, will requeue later 00000100:00080000:0:1217780844.291639:0:5596:0:(import.c:504:ptlrpc_connect_import()) Last reconnection attempt (3480) for MGS 00000100:00080000:4:1217780844.291870:0:5595:0:(import.c:709:ptlrpc_connect_interpret()) reconnected to [EMAIL PROTECTED]@o2ib after partition 00000100:00080000:4:1217780844.291873:0:5595:0:(import.c:714:ptlrpc_connect_interpret()) [EMAIL PROTECTED]: reconnected but import is invalid; marking evicted 00000100:00080000:4:1217780844.291874:0:5595:0:(import.c:715:ptlrpc_connect_interpret()) ffff810275e06188 MGS: changing import state from CONNECTING to EVICTED 00000100:00080000:4:1217780844.291877:0:5595:0:(import.c:1053:ptlrpc_import_recovery_state_machine()) evicted from [EMAIL PROTECTED]@o2ib; invalidating 10000000:01000000:4:1217780844.291975:0:5595:0:(mgc_request.c:880:mgc_import_event()) import event 0x808005 00000100:00080000:0:1217780844.291996:0:16704:0:(import.c:1016:ptlrpc_invalidate_import_thread()) thread invalidate import [EMAIL PROTECTED] to [EMAIL PROTECTED]@o2i b 10000000:01000000:0:1217780844.292000:0:16704:0:(mgc_request.c:880:mgc_import_event()) import event 0x808003 00000100:00080000:0:1217780844.292132:0:16704:0:(import.c:1025:ptlrpc_invalidate_import_thread()) ffff810275e06188 MGS: changing import state from EVICTED to RECOVER 00000100:00080000:0:1217780844.292134:0:16704:0:(import.c:1103:ptlrpc_import_recovery_state_machine()) reconnected to [EMAIL PROTECTED]@o2ib 00000100:00080000:0:1217780844.292135:0:16704:0:(import.c:1108:ptlrpc_import_recovery_state_machine()) ffff810275e06188 MGS: changing import state from RECOVER to FU LL Does this give any hints... there's a lot more log to go. Thanks, Chris _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
