Hello,
after fixing the broken hardware stuff one problem remains.
We are using lustre with automount since over one year without problems.
Since the hardware failure a few days ago (a Gigabit switch and the SATA 
Backplane in one MDS) the following happens.

Client quadcore1. The lustre system is mounted under '/misc/data' (mountpoint) 
via automount.
'mount':
m...@tcp0:m...@tcp0:/scia on /misc/data type lustre (rw)

Now doing:
'umount /misc/data'
Nov 10 10:14:24 quadcore1 LustreError: 
8751:0:(ldlm_request.c:986:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: 
canceling anyway
Nov 10 10:14:24 quadcore1 LustreError: 
8751:0:(ldlm_request.c:1575:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
Nov 10 10:14:24 quadcore1 LustreError: 
8751:0:(connection.c:144:ptlrpc_put_connection()) NULL connection
Nov 10 10:14:24 quadcore1 LustreError: 
8751:0:(connection.c:144:ptlrpc_put_connection()) Skipped 13 previous similar 
messages
Nov 10 10:14:24 quadcore1 Lustre: client ffff81001606dc00 umount complete

Trying to automount and digging one or more Dirs deeper than the mountpoint 
(client console hangs after this command):
'ls -la /misc/data/OneDirDeeper'
Nov 10 10:14:33 quadcore1 automount[2797]: attempting to mount entry /misc/data
Nov 10 10:14:34 quadcore1 Lustre: Client scia-client has started
Nov 10 10:14:34 quadcore1 automount[2797]: mount(generic): mounted 
m...@tcp0:m...@tcp0:/scia type lustre on /misc/data
Nov 10 10:14:34 quadcore1 automount[2797]: mounted /misc/data
Nov 10 10:14:34 quadcore1 LustreError: 
3115:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from 
12345-192.168.16....@tcp, match 86814 length 1336 too big: 1272 left, 1272 
allowed
Nov 10 10:14:34 quadcore1 Lustre: 3115:0:(lib-move.c:1647:lnet_parse_put()) 
Dropping PUT from 12345-192.168.16....@tcp portal 10 match 86814 offset 128 
length 1336: 2

In a new console (releasing the freezed console above):
'umount /misc/data':
Nov 10 10:15:38 quadcore1 Lustre: setting import scia-MDT0000_UUID INACTIVE by 
administrator request
Nov 10 10:15:38 quadcore1 Lustre: Skipped 13 previous similar messages
Nov 10 10:15:38 quadcore1 LustreError: 
3122:0:(mdc_locks.c:841:mdc_intent_getattr_async_interpret()) 
ldlm_cli_enqueue_fini: -4
Nov 10 10:15:38 quadcore1 LustreError: 
3122:0:(mdc_locks.c:841:mdc_intent_getattr_async_interpret()) Skipped 2 
previous similar messages
Nov 10 10:15:38 quadcore1 LustreError: 
8754:0:(client.c:716:ptlrpc_import_delay_req()) @@@ IMP_INVALID  
r...@ffff8102244c6c00 x86895/t0 
o101->[email protected]@tcp:12/10 lens 440/1400 e 0 to 100 dl 0 
ref 1 fl Rpc:/0/0 rc 0/0
Nov 10 10:15:38 quadcore1 LustreError: 
8754:0:(client.c:716:ptlrpc_import_delay_req()) Skipped 2 previous similar 
messages
Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(mdc_locks.c:586:mdc_enqueue()) 
ldlm_cli_enqueue: -108
Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(mdc_locks.c:586:mdc_enqueue()) 
Skipped 76 previous similar messages
Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:261:ll_get_dir_page()) 
lock enqueue: rc: -108
Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:261:ll_get_dir_page()) 
Skipped 2 previous similar messages
Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:415:ll_readdir()) error 
reading dir 4167519/1275738219 page 6: rc -108
Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:415:ll_readdir()) Skipped 
2 previous similar messages

There are no messages on the MDS or OSTs related to this.
Doing an 'ls -la /misc/data' works ok and the lustre system gets mounted 
properly on /misc/data.
The above scenario is reproducable on all clients.
The system works fine when the lustre system is mounted statically or after the 
mount is done in a proper way.

lustre-1.6.6
vanilla-2.6.22.19

Thanks and Regards
Heiko
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to