> > On Seg, 2007-02-05 at 10:06 +0000, João Miguel Neves wrote:
> > > Good morning,
> > > 
> > > To make a story short: the network malfunctioned and the client was
> > > reset before the network problems were solved. Now the system seems to
> > > be presenting some coherency issues:
> > > 
> > > On the client I see:
> > > LustreError: 3311:0:(client.c:951:ptlrpc_expire_one_request()) @@@ 
> > > timeout (sent at 1170669653, 5s ago) [EMAIL PROTECTED] x348045/t0 
> > > o8->[EMAIL PROTECTED]:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
> > > LustreError: 3311:0:(client.c:951:ptlrpc_expire_one_request()) previously 
> > > skipped 199 similar messages
> > > 
> > > On the node no6 I see:
> > > LustreError: 5462:0:(lib-move.c:152:lnet_match_md()) Dropping PUT from 
> > > [EMAIL PROTECTED] portal 6 match 0x52844b offset 0 length 240: no match
> > > LustreError: 5462:0:(lib-move.c:152:lnet_match_md()) previously skipped 
> > > 287 similar messages
> 
> It means nothing is listening for this request on the OST (i.e. it is not
> started up yet).

So this would normally happen if an OST is down, right? But the node no6
has the 8 OSTs that it always had...

# cat /proc/fs/lustre/devices 
  0 UP obdfilter b-ost0 b-ost0_UUID 4
  1 UP ost OSS OSS_UUID 3
  2 UP obdfilter b-ost1 b-ost1_UUID 4
  3 UP obdfilter b-ost2 b-ost2_UUID 4
  4 UP obdfilter b-ost3 b-ost3_UUID 5
  5 UP obdfilter b-ost4 b-ost4_UUID 5
  6 UP obdfilter b-ost5 b-ost5_UUID 5
  7 UP obdfilter b-ost6 b-ost6_UUID 5
  8 UP obdfilter b-ost7 b-ost7_UUID 5

Giving some more info to see if I can understand what's going on:

client - 10.10.1.2
mds - 10.10.1.4
no6 (ost) - 10.10.1.6

If I just test the names (metadata) with a 'find' on the client,
everything shows correctly. If I do a 'find -ls' in some directories
(the ones I suspect where there has been data loss), it simply seems to
block and, I get this in the log of no6:

LustreError: 5462:0:(lib-move.c:152:lnet_match_md()) Dropping PUT from [EMAIL 
PROTECTED] portal 6 match 0x155830 offset 0 length 240: no match
LustreError: 5462:0:(lib-move.c:152:lnet_match_md()) previously skipped 295 
similar messages

On the client I get:

LustreError: 3314:0:(client.c:951:ptlrpc_expire_one_request()) @@@ timeout 
(sent at 1170756287, 5s ago) [EMAIL PROTECTED] x1398929/t0 o8->[EMAIL 
PROTECTED]:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 3314:0:(client.c:951:ptlrpc_expire_one_request()) previously 
skipped 199 similar messages

Does this make any sense?

Thanks,
                                        João Miguel Neves

Attachment: signature.asc
Description: Esta é uma parte de mensagem assinada digitalmente

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to