I am trying to understand. What was the problem? How does SD_IOSTATS affect the crash? How did you disable this?
Sorry for a newbie question.... TIA On Sun, Jul 20, 2008 at 4:54 AM, Robin Humble <[EMAIL PROTECTED]> wrote: > On Fri, Jul 18, 2008 at 09:02:36AM -0400, Brian J. Murrell wrote: >>On Fri, 2008-07-18 at 05:52 -0400, Robin Humble wrote: >>> Hi, >>> >>> I'm seeing coordinated OSS crashes with Lustre 1.6.5.1. >>> >>> our RHEL4 OSS have been stable for ~months with these kernels: >>> kernel-lustre-smp-2.6.9-67.0.4.EL_lustre.1.6.4.3 >>> kernel-lustre-smp-2.6.9-55.0.9.EL_lustre.1.6.4.2 >>> >>> but have crashed hard, twice, about 10hrs apart as soon as we started >>> using this kernel: >>> kernel-lustre-smp-2.6.9-67.0.7.EL_lustre.1.6.5.1 >>Can you try rebuilding the kernel, disabling SD_IOSTATS? > > done. I rebuilt using the stock kernel's InfiniBand stack and > # CONFIG_SD_IOSTATS is not set > > % cexec -p oss: uptime > oss x17: 18:45:07 up 1 day, 30 min, 1 user, load average: 4.97, 7.00, 6.27 > oss x18: 18:45:07 up 1 day, 23 min, 1 user, load average: 4.18, 5.78, 5.71 > oss x19: 18:45:07 up 1 day, 23 min, 1 user, load average: 5.18, 5.66, 4.60 > > which is >> the 10hrs it was crashing at before. > good guess about the cause of the problem! :-) > > maybe that rhel4 1.6.5.1 kernel rpm needs a respin then? seems like a > fairly critical issue... :-/ > > cheers, > robin > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
