These are client nodes and in fact, this class of node is running a
particular application that intermittently fails leaving a lustre
error in syslog.
"mg38 kernel: LustreError:
10147:0:(lov_request.c:180:lov_update_enqueue_set()) error: enqueue
objid 0x3922667 subobj 0x15dfc on OST idx 1: rc = -4"
While mg38 is currently showing a negative valued for memused, I have
not been able to tie that to a failure. The error message points to
the same file
Jan
At 1:34 PM -0600 4/4/07, Andreas Dilger wrote:
On Apr 04, 2007 11:28 -0800, Jan H. Julian wrote:
Could someone please clarify the use of the proc values for
subsystem_debug and memused. In regard to
/proc/sys/portals/subsystem_debug and /proc/sys/portals/debug, should
both be set to zero to totally turn of debugging?
In regard to /proc/sys/lustre/memused we see quite a variety of
entries included many with negative values. Does the negative value
have a particular meaning?
For instance "cat /proc/sys/lustre/memused" for 9 nodes shows:
...
mg07 102186899
mg08 101775995
mg09 -1323553489
mg10 -1328553965
mg11 -1378379739
mg12 -1347059989
mg13 -1364717487
mg14 -1358477913
mg15 24680370
These are 16 core machines with 64GB of resident memory.
This appears to be an overflow of a 32-bit counter. It isn't strictly
harmful, because it will underflow an equal amount later on and should
return to zero when Lustre unmounts. It does make this stat less useful
on machines with lots of RAM.
Are these client or server nodes? I'm a bit surprised that Lustre would
be allocating > 2GB of RAM.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
--
Jan Julian University of Alaska, ARSC mailto:[EMAIL PROTECTED]
(907) 450-8641 910 Yukon Drive, Suite 001 http://www.arsc.edu
Fax: 450-8605 Fairbanks, AK 99775-6020 USA
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss