> > On Seg, 2007-02-05 at 10:06 +0000, João Miguel Neves wrote: > > > Good morning, > > > > > > To make a story short: the network malfunctioned and the client was > > > reset before the network problems were solved. Now the system seems to > > > be presenting some coherency issues: > > > > > > On the client I see: > > > LustreError: 3311:0:(client.c:951:ptlrpc_expire_one_request()) @@@ > > > timeout (sent at 1170669653, 5s ago) [EMAIL PROTECTED] x348045/t0 > > > o8->[EMAIL PROTECTED]:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 > > > LustreError: 3311:0:(client.c:951:ptlrpc_expire_one_request()) previously > > > skipped 199 similar messages > > > > > > On the node no6 I see: > > > LustreError: 5462:0:(lib-move.c:152:lnet_match_md()) Dropping PUT from > > > [EMAIL PROTECTED] portal 6 match 0x52844b offset 0 length 240: no match > > > LustreError: 5462:0:(lib-move.c:152:lnet_match_md()) previously skipped > > > 287 similar messages > > It means nothing is listening for this request on the OST (i.e. it is not > started up yet).
So this would normally happen if an OST is down, right? But the node no6
has the 8 OSTs that it always had...
# cat /proc/fs/lustre/devices
0 UP obdfilter b-ost0 b-ost0_UUID 4
1 UP ost OSS OSS_UUID 3
2 UP obdfilter b-ost1 b-ost1_UUID 4
3 UP obdfilter b-ost2 b-ost2_UUID 4
4 UP obdfilter b-ost3 b-ost3_UUID 5
5 UP obdfilter b-ost4 b-ost4_UUID 5
6 UP obdfilter b-ost5 b-ost5_UUID 5
7 UP obdfilter b-ost6 b-ost6_UUID 5
8 UP obdfilter b-ost7 b-ost7_UUID 5
Giving some more info to see if I can understand what's going on:
client - 10.10.1.2
mds - 10.10.1.4
no6 (ost) - 10.10.1.6
If I just test the names (metadata) with a 'find' on the client,
everything shows correctly. If I do a 'find -ls' in some directories
(the ones I suspect where there has been data loss), it simply seems to
block and, I get this in the log of no6:
LustreError: 5462:0:(lib-move.c:152:lnet_match_md()) Dropping PUT from [EMAIL
PROTECTED] portal 6 match 0x155830 offset 0 length 240: no match
LustreError: 5462:0:(lib-move.c:152:lnet_match_md()) previously skipped 295
similar messages
On the client I get:
LustreError: 3314:0:(client.c:951:ptlrpc_expire_one_request()) @@@ timeout
(sent at 1170756287, 5s ago) [EMAIL PROTECTED] x1398929/t0 o8->[EMAIL
PROTECTED]:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 3314:0:(client.c:951:ptlrpc_expire_one_request()) previously
skipped 199 similar messages
Does this make any sense?
Thanks,
João Miguel Neves
signature.asc
Description: Esta é uma parte de mensagem assinada digitalmente
_______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
