Andreas Dilger wrote: > On Oct 22, 2008 14:37 -0600, Craig Tierney wrote: >> I just had two nodes hang with the following soft lockup messages. >> I am running Centos 5.2 (2.6.18-93.1.13.el5) with the patchless client >> (1.6.5.1). My nodes do not have swap configured on them (no local >> disks). We do have a tool that looks for out of memory condition >> and neither of the nodes in question reported a problem (not that it >> is perfect). > > Note that soft lockups are only a warning. It shouldn't mean that the > node is completely dead, only that some thread was hogging the CPU. >
The two soft lockup messages (one in kswapd0 and the other in the user process convert_emiss) repeated their messages for 6 hours before I rebooted the node. I don't recall if I could login to the node or not. >> Does the problem look like an issue with Lustre? > > Lots of Lustre functions on the stack... > >> Oct 22 08:06:45 h53 kernel: BUG: soft lockup - CPU#2 stuck for 10s! >> [kswapd0:418] >> Oct 22 08:06:45 h53 kernel: Call Trace: >> Oct 22 08:06:45 h53 kernel: [<ffffffff8871125a>] >> :osc:cache_remove_extent+0x4a/0x90 >> Oct 22 08:06:45 h53 kernel: [<ffffffff88707c5a>] >> :osc:osc_teardown_async_page+0x25a/0x3c0 > > Do you have particularly large files in use (e.g. in the realm of 1TB or > more)? It seems possible that if there are a lot of pages to be cleaned > up that this might cause a report like this. > My first guess would be no, we don't create files that large. But it is entirely possible a user did something wrong with this code which caused some large files (append vs. create). I will check it out. Thanks, Craig > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > -- Craig Tierney ([EMAIL PROTECTED]) _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
