On Wed, Jul 15, 2009 at 11:59:54AM -0400, Brian J. Murrell wrote: >On Wed, 2009-07-15 at 11:22 -0400, Robin Humble wrote: >> another data point is that the above errors don't happen with >> 2.6.18-128.1.14.el5 patched with 1.8.0.1 and using the same in-kernel >> OFED, so it's probably something that's happened between 1.8.0.1 and >> 1.8.1-pre. >> or I guess it could be a rhel change between 2.6.18-128.1.14.el5 and >> 2.6.18-128.1.16.el5, but that seems less likely. >> I can spin up a 2.6.18-128.1.14.el5 with b_release_1_8_1 if you like... >Yeah, it would be a great troubleshooting addition to see if the same >kernel on the clients and servers with the different lustre versions has >the same problem. This would isolate the problem either to or away from >a problem with the difference in OFED stacks.
ok - I made a 2.6.18-128.1.14.el5 with b_release_1_8_1 and it behaves the same as 2.6.18-128.1.16.el5 with b_release_1_8_1. ie. spits out a bunch of errors on the first lustre mount. the only changes between those rhel .14 and .16 versions looks pretty unrelated to IB/lnet, so I guess that was to be expected: * Sat Jun 27 2009 Jiri Pirko <jpi...@redhat.com> [2.6.18-128.1.16.el5] - [mm] prevent panic in copy_hugetlb_page_range (Larry Woodman ) [508030 507860] * Tue Jun 23 2009 Jiri Pirko <jpi...@redhat.com> [2.6.18-128.1.15.el5] - [mm] fix swap race condition in fork-gup-race patch (Andrea Arcangeli) [507297 506684] so I guess the change is between Lustre 1.8.0.1 and b_release_1_8_1-20090712131220 somewhere. if only we had git bisect, and if only I knew how to use it, and if only I had the time to try it... :-) cheers, robin -- Dr Robin Humble, HPC Systems Analyst, NCI National Facility _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss