On Fri, Jan 29, 2010 at 05:00:10PM +0200, Deon Borman wrote: > I have a weird problem on one of my OSSs, though I've seen it once on > the other OSS. Things will be humming along nicely, when suddenly I get > lots of messages like this: > > Jan 29 15:26:16 venus kernel: Lustre: > 898:0:(socklnd_cb.c:915:ksocknal_launch_packet()) No usable routes to > 12345-192.168.1...@tcp > Jan 29 15:26:16 venus kernel: Lustre: > 1090:0:(socklnd_cb.c:915:ksocknal_launch_packet()) No usable routes to > 12345-192.168.1...@tcp
Any errors reported on the router nodes? > In the 50 odd minutes before I picked it up, it produced over 10 million > such lines in /var/log/messages. That's a known problem. In 1.8.1, neterror is printed to the console by default, but those messages are not rate-limited. This is fixed in 1.8.2, see bug 20805. In the meantine, you can disable neterror on the console as follows: # lctl set_param debug=-neterror lnet.debug=-neterror However, this will just avoid flooding the console, but won't address the router connection problem. Johann _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
