On Apr 04, 2007  11:57 -0800, Jan H. Julian wrote:
> These are client nodes and in fact, this class of node is running a 
> particular application that intermittently fails leaving a lustre 
> error in syslog.
> 
> "mg38 kernel: LustreError: 
> 10147:0:(lov_request.c:180:lov_update_enqueue_set()) error: enqueue 
> objid 0x3922667 subobj 0x15dfc on OST idx 1: rc = -4"

This means that the enqueue was interrupted (-4 = -EINTR in
/usr/include/asm/errno.h).  That shouldn't happen unless the job was
waiting a long time already (at least 100s, and then it was killed).

What does /proc/slabinfo show for lustre allocated memory in the slab
cache (most items are "ll_*")?

> While mg38 is currently showing a negative valued for memused, I have 
> not been able to tie that to a failure.  The error message points to 
> the same file
> 
> At 1:34 PM -0600 4/4/07, Andreas Dilger wrote:
> >On Apr 04, 2007  11:28 -0800, Jan H. Julian wrote:
> >> Could someone please clarify the use of the proc values for
> >> subsystem_debug and memused.  In regard to
> >> /proc/sys/portals/subsystem_debug and /proc/sys/portals/debug, should
> >> both be set to zero to totally turn of debugging?
> >>
> >> In regard to /proc/sys/lustre/memused we see quite a variety of
> >> entries included many with negative values.  Does the negative value
> >> have a particular meaning?
> >> For instance "cat /proc/sys/lustre/memused" for 9 nodes shows:
> >> ...
> >> mg07  102186899
> >> mg08  101775995
> >> mg09  -1323553489
> >> mg10  -1328553965
> >> mg11  -1378379739
> >> mg12  -1347059989
> >> mg13  -1364717487
> >> mg14  -1358477913
> >> mg15  24680370
> >>
> >> These are 16 core  machines with 64GB of resident memory.
> >
> >This appears to be an overflow of a 32-bit counter.  It isn't strictly
> >harmful, because it will underflow an equal amount later on and should
> >return to zero when Lustre unmounts.  It does make this stat less useful
> >on machines with lots of RAM.
> >
> >Are these client or server nodes?  I'm a bit surprised that Lustre would
> >be allocating > 2GB of RAM.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to