Re: [Lustre-discuss] subsystem_debug and mem_used in proc

Jan H. Julian Wed, 04 Apr 2007 14:32:41 -0700

At 2:52 PM -0600 4/4/07, Andreas Dilger wrote:

On Apr 04, 2007  11:57 -0800, Jan H. Julian wrote:

 These are client nodes and in fact, this class of node is running a
 particular application that intermittently fails leaving a lustre
 error in syslog.


 "mg38 kernel: LustreError:
 10147:0:(lov_request.c:180:lov_update_enqueue_set()) error: enqueue
 objid 0x3922667 subobj 0x15dfc on OST idx 1: rc = -4"


This means that the enqueue was interrupted (-4 = -EINTR in
/usr/include/asm/errno.h).  That shouldn't happen unless the job was
waiting a long time already (at least 100s, and then it was killed).

What does /proc/slabinfo show for lustre allocated memory in the slab
cache (most items are "ll_*")?



For mg38 at this time I see:

mg37

ll_async_page 256 559 288 13 1 : tunables 54 278 : slabdata 43 43 0ll_file_data 16 120 128 30 1 : tunables 120 608 : slabdata 4 4 0ll_import_cache 0 0 360 11 1 : tunables 54 278 : slabdata 0 0 0ll_obdo_cache 0 0 208 19 1 : tunables 120 608 : slabdata 0 0 0ll_obd_dev_cache 38 38 5120 1 2 : tunables 8 40 : slabdata 38 38 0eventpoll_pwq 33 53 72 53 1 : tunables 120 608 : slabdata 1 1 0eventpoll_epi 30 30 256 15 1 : tunables 120 608 : slabdata 2 2 0


mg38

ll_async_page 256 754 288 13 1 : tunables 54 278 : slabdata 58 58 0ll_file_data 16 90 128 30 1 : tunables 120 608 : slabdata 3 3 0ll_import_cache 0 0 360 11 1 : tunables 54 278 : slabdata 0 0 0ll_obdo_cache 0 0 208 19 1 : tunables 120 608 : slabdata 0 0 0ll_obd_dev_cache 38 38 5120 1 2 : tunables 8 40 : slabdata 38 38 0eventpoll_pwq 49 53 72 53 1 : tunables 120 608 : slabdata 1 1 0eventpoll_epi 45 45 256 15 1 : tunables 120 608 : slabdata 3 3 0


mg39

ll_async_page 256 637 288 13 1 : tunables 54 278 : slabdata 49 49 0ll_file_data 16 90 128 30 1 : tunables 120 608 : slabdata 3 3 0ll_import_cache 0 0 360 11 1 : tunables 54 278 : slabdata 0 0 0ll_obdo_cache 0 0 208 19 1 : tunables 120 608 : slabdata 0 0 0ll_obd_dev_cache 38 38 5120 1 2 : tunables 8 40 : slabdata 38 38 0eventpoll_pwq 49 53 72 53 1 : tunables 120 608 : slabdata 1 1 0eventpoll_epi 45 45 256 15 1 : tunables 120 608 : slabdata 3 3 0

 While mg38 is currently showing a negative valued for memused, I have
 not been able to tie that to a failure.  The error message points to
 the same file

 At 1:34 PM -0600 4/4/07, Andreas Dilger wrote:
 >On Apr 04, 2007  11:28 -0800, Jan H. Julian wrote:
 >> Could someone please clarify the use of the proc values for
 >> subsystem_debug and memused.  In regard to
 >> /proc/sys/portals/subsystem_debug and /proc/sys/portals/debug, should
 >> both be set to zero to totally turn of debugging?
 >>
 >> In regard to /proc/sys/lustre/memused we see quite a variety of
 >> entries included many with negative values.  Does the negative value
 >> have a particular meaning?
 >> For instance "cat /proc/sys/lustre/memused" for 9 nodes shows:
 >> ...
 >> mg07  102186899
 >> mg08  101775995
 >> mg09  -1323553489
 >> mg10  -1328553965

 > >> mg11  -1378379739

 >> mg12  -1347059989
 >> mg13  -1364717487
 >> mg14  -1358477913
 >> mg15  24680370
 >>
 >> These are 16 core  machines with 64GB of resident memory.
 >
 >This appears to be an overflow of a 32-bit counter.  It isn't strictly
 >harmful, because it will underflow an equal amount later on and should
 >return to zero when Lustre unmounts.  It does make this stat less useful
 >on machines with lots of RAM.
 >
 >Are these client or server nodes?  I'm a bit surprised that Lustre would
 >be allocating > 2GB of RAM.


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



--
Jan Julian     University of Alaska, ARSC    mailto:[EMAIL PROTECTED]
(907) 450-8641  910 Yukon Drive, Suite 001    http://www.arsc.edu
Fax: 450-8605  Fairbanks, AK 99775-6020 USA

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] subsystem_debug and mem_used in proc

Reply via email to