On Apr 04, 2007 11:57 -0800, Jan H. Julian wrote: > These are client nodes and in fact, this class of node is running a > particular application that intermittently fails leaving a lustre > error in syslog. > > "mg38 kernel: LustreError: > 10147:0:(lov_request.c:180:lov_update_enqueue_set()) error: enqueue > objid 0x3922667 subobj 0x15dfc on OST idx 1: rc = -4"
This means that the enqueue was interrupted (-4 = -EINTR in /usr/include/asm/errno.h). That shouldn't happen unless the job was waiting a long time already (at least 100s, and then it was killed). What does /proc/slabinfo show for lustre allocated memory in the slab cache (most items are "ll_*")? > While mg38 is currently showing a negative valued for memused, I have > not been able to tie that to a failure. The error message points to > the same file > > At 1:34 PM -0600 4/4/07, Andreas Dilger wrote: > >On Apr 04, 2007 11:28 -0800, Jan H. Julian wrote: > >> Could someone please clarify the use of the proc values for > >> subsystem_debug and memused. In regard to > >> /proc/sys/portals/subsystem_debug and /proc/sys/portals/debug, should > >> both be set to zero to totally turn of debugging? > >> > >> In regard to /proc/sys/lustre/memused we see quite a variety of > >> entries included many with negative values. Does the negative value > >> have a particular meaning? > >> For instance "cat /proc/sys/lustre/memused" for 9 nodes shows: > >> ... > >> mg07 102186899 > >> mg08 101775995 > >> mg09 -1323553489 > >> mg10 -1328553965 > >> mg11 -1378379739 > >> mg12 -1347059989 > >> mg13 -1364717487 > >> mg14 -1358477913 > >> mg15 24680370 > >> > >> These are 16 core machines with 64GB of resident memory. > > > >This appears to be an overflow of a 32-bit counter. It isn't strictly > >harmful, because it will underflow an equal amount later on and should > >return to zero when Lustre unmounts. It does make this stat less useful > >on machines with lots of RAM. > > > >Are these client or server nodes? I'm a bit surprised that Lustre would > >be allocating > 2GB of RAM. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
