On Wed, 2009-07-15 at 11:22 -0400, Robin Humble wrote: > all kernels all compiled with the rhel5 kernel tree's standard OFED. > I think 1.3.2 is what's in rhel5.3/centos5.3?
Yeah, something like that IIRC. > the error messages are just on the initial mount of the first lustre fs. > subsequent mounts of other lustre fs's don't get any messages, so it > seems like it's just an extremely noisy protocol/version negotiation > the first time the 1.8.1 lnet fires up and tries to talk to 1.6.7.2 > servers?? Maybe one of our LNET experts might have some additional information to offer. > another data point is that the above errors don't happen with > 2.6.18-128.1.14.el5 patched with 1.8.0.1 and using the same in-kernel > OFED, so it's probably something that's happened between 1.8.0.1 and > 1.8.1-pre. > or I guess it could be a rhel change between 2.6.18-128.1.14.el5 and > 2.6.18-128.1.16.el5, but that seems less likely. > I can spin up a 2.6.18-128.1.14.el5 with b_release_1_8_1 if you like... Yeah, it would be a great troubleshooting addition to see if the same kernel on the clients and servers with the different lustre versions has the same problem. This would isolate the problem either to or away from a problem with the difference in OFED stacks. > cool. thanks for the explanation. NP. b.
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
