Hello, after fixing the broken hardware stuff one problem remains. We are using lustre with automount since over one year without problems. Since the hardware failure a few days ago (a Gigabit switch and the SATA Backplane in one MDS) the following happens.
Client quadcore1. The lustre system is mounted under '/misc/data' (mountpoint) via automount. 'mount': m...@tcp0:m...@tcp0:/scia on /misc/data type lustre (rw) Now doing: 'umount /misc/data' Nov 10 10:14:24 quadcore1 LustreError: 8751:0:(ldlm_request.c:986:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway Nov 10 10:14:24 quadcore1 LustreError: 8751:0:(ldlm_request.c:1575:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Nov 10 10:14:24 quadcore1 LustreError: 8751:0:(connection.c:144:ptlrpc_put_connection()) NULL connection Nov 10 10:14:24 quadcore1 LustreError: 8751:0:(connection.c:144:ptlrpc_put_connection()) Skipped 13 previous similar messages Nov 10 10:14:24 quadcore1 Lustre: client ffff81001606dc00 umount complete Trying to automount and digging one or more Dirs deeper than the mountpoint (client console hangs after this command): 'ls -la /misc/data/OneDirDeeper' Nov 10 10:14:33 quadcore1 automount[2797]: attempting to mount entry /misc/data Nov 10 10:14:34 quadcore1 Lustre: Client scia-client has started Nov 10 10:14:34 quadcore1 automount[2797]: mount(generic): mounted m...@tcp0:m...@tcp0:/scia type lustre on /misc/data Nov 10 10:14:34 quadcore1 automount[2797]: mounted /misc/data Nov 10 10:14:34 quadcore1 LustreError: 3115:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from 12345-192.168.16....@tcp, match 86814 length 1336 too big: 1272 left, 1272 allowed Nov 10 10:14:34 quadcore1 Lustre: 3115:0:(lib-move.c:1647:lnet_parse_put()) Dropping PUT from 12345-192.168.16....@tcp portal 10 match 86814 offset 128 length 1336: 2 In a new console (releasing the freezed console above): 'umount /misc/data': Nov 10 10:15:38 quadcore1 Lustre: setting import scia-MDT0000_UUID INACTIVE by administrator request Nov 10 10:15:38 quadcore1 Lustre: Skipped 13 previous similar messages Nov 10 10:15:38 quadcore1 LustreError: 3122:0:(mdc_locks.c:841:mdc_intent_getattr_async_interpret()) ldlm_cli_enqueue_fini: -4 Nov 10 10:15:38 quadcore1 LustreError: 3122:0:(mdc_locks.c:841:mdc_intent_getattr_async_interpret()) Skipped 2 previous similar messages Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(client.c:716:ptlrpc_import_delay_req()) @@@ IMP_INVALID r...@ffff8102244c6c00 x86895/t0 o101->[email protected]@tcp:12/10 lens 440/1400 e 0 to 100 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(client.c:716:ptlrpc_import_delay_req()) Skipped 2 previous similar messages Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(mdc_locks.c:586:mdc_enqueue()) ldlm_cli_enqueue: -108 Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(mdc_locks.c:586:mdc_enqueue()) Skipped 76 previous similar messages Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:261:ll_get_dir_page()) lock enqueue: rc: -108 Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:261:ll_get_dir_page()) Skipped 2 previous similar messages Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:415:ll_readdir()) error reading dir 4167519/1275738219 page 6: rc -108 Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:415:ll_readdir()) Skipped 2 previous similar messages There are no messages on the MDS or OSTs related to this. Doing an 'ls -la /misc/data' works ok and the lustre system gets mounted properly on /misc/data. The above scenario is reproducable on all clients. The system works fine when the lustre system is mounted statically or after the mount is done in a proper way. lustre-1.6.6 vanilla-2.6.22.19 Thanks and Regards Heiko _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
