On Wed, Jul 15, 2009 at 11:59:54AM -0400, Brian J. Murrell wrote:
>On Wed, 2009-07-15 at 11:22 -0400, Robin Humble wrote:
>> another data point is that the above errors don't happen with
>> 2.6.18-128.1.14.el5 patched with 1.8.0.1 and using the same in-kernel
>> OFED, so it's probably something that's happened between 1.8.0.1 and
>> 1.8.1-pre.
>> or I guess it could be a rhel change between 2.6.18-128.1.14.el5 and
>> 2.6.18-128.1.16.el5, but that seems less likely.
>> I can spin up a 2.6.18-128.1.14.el5 with b_release_1_8_1 if you like...
>Yeah, it would be a great troubleshooting addition to see if the same
>kernel on the clients and servers with the different lustre versions has
>the same problem.  This would isolate the problem either to or away from
>a problem with the difference in OFED stacks.

ok - I made a 2.6.18-128.1.14.el5 with b_release_1_8_1 and it behaves
the same as 2.6.18-128.1.16.el5 with b_release_1_8_1. ie. spits out a
bunch of errors on the first lustre mount.

the only changes between those rhel .14 and .16 versions looks pretty
unrelated to IB/lnet, so I guess that was to be expected:
  * Sat Jun 27 2009 Jiri Pirko <jpi...@redhat.com> [2.6.18-128.1.16.el5]
  - [mm] prevent panic in copy_hugetlb_page_range (Larry Woodman ) [508030 
507860]
  
  * Tue Jun 23 2009 Jiri Pirko <jpi...@redhat.com> [2.6.18-128.1.15.el5]
  - [mm] fix swap race condition in fork-gup-race patch (Andrea Arcangeli) 
[507297 506684]

so I guess the change is between Lustre 1.8.0.1 and
b_release_1_8_1-20090712131220 somewhere.
if only we had git bisect, and if only I knew how to use it, and if only
I had the time to try it... :-)

cheers,
robin
--
Dr Robin Humble, HPC Systems Analyst, NCI National Facility
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to