On Oct 22, 2008 14:37 -0600, Craig Tierney wrote: > I just had two nodes hang with the following soft lockup messages. > I am running Centos 5.2 (2.6.18-93.1.13.el5) with the patchless client > (1.6.5.1). My nodes do not have swap configured on them (no local > disks). We do have a tool that looks for out of memory condition > and neither of the nodes in question reported a problem (not that it > is perfect).
Note that soft lockups are only a warning. It shouldn't mean that the node is completely dead, only that some thread was hogging the CPU. > Does the problem look like an issue with Lustre? Lots of Lustre functions on the stack... > Oct 22 08:06:45 h53 kernel: BUG: soft lockup - CPU#2 stuck for 10s! > [kswapd0:418] > Oct 22 08:06:45 h53 kernel: Call Trace: > Oct 22 08:06:45 h53 kernel: [<ffffffff8871125a>] > :osc:cache_remove_extent+0x4a/0x90 > Oct 22 08:06:45 h53 kernel: [<ffffffff88707c5a>] > :osc:osc_teardown_async_page+0x25a/0x3c0 Do you have particularly large files in use (e.g. in the realm of 1TB or more)? It seems possible that if there are a lot of pages to be cleaned up that this might cause a report like this. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
