Hi Michael, > > On Fri, 2009-05-22 at 16:38 -0400, Michael D. Seymour wrote: > >> Hi all, > >> > >> One client running CentOS 5.2 re-exports the Lustre filesystem via NFS on > >> a > >> different network. > >> > >> We get the following messages on a particular client: > >> > >> May 22 15:07:45 trinity kernel: LustreError: > >> 5111:0:(lib-move.c:110:lnet_try_match_md()) Matching packet from > >> 12345-10.5.203....@tcp, match 19154486 length 728 too big: 704 left, 704 > >> allowed > > > > what frequently for this bug? > > Sets of entries (about 20) happen a few times per day, each entry spaced > about > ten minutes apart. can you please show syslog messages around this time - should be exist lines with errors related to 'match XXXXX' (in this example match 19154486 -- should be something about request x19154486).
> > > if this quickly replicated - please set > > lnet.debug=-1, lnet.debug_subsystem=-1 lnet.debug_mb=100, on mds and > > client, replicate and save logs with lctl dk > $logfile. > > Debugging has been enabled.I haven't been able to catch it in the act yet. > Will > enabling the debug logging until I can catch the bug overflow anything? you can use 'lctl debug_daemon start $file' but at this case size is limited to 512M :\ If it possible to you i can make small patch which do dump lustre log if this error is hit. > > > after it - please fill a bug and attach log from MDS and client to bug. > > A bug will be filed as soon as it can be caught with logging enabled. > > > this message say - client want for reply less data when mds is send. > > Trinity cannot accept data as large as the MDS is sending? yes. this should be caused timeout on waiting answer to request and later reconnect. > > Thanks for you help, > Mike > > > -- Alexey Lyashkov <[email protected]> Sun Microsystems _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
