Re: [Dorset] Very high memory usage (ignoring cache) after being powered on for days

Hamish McIntyre-Bhatty Fri, 11 Sep 2020 02:32:19 -0700

On 11/09/2020 10:04, Ralph Corderoy wrote:
> Hi Hamish,
>
>> NB: On desktop I seem to have a very high number for "SUnreclaim" in
>> /proc/meminfo:
>>
>> MemTotal:       32812004 kB
>> MemFree:         8619976 kB
>> MemAvailable:    9572924 kB
>> Buffers:           61772 kB
>> Cached:          1061212 kB
>> SwapCached:      1190212 kB
> ...
>> Slab:           16832040 kB
>> SReclaimable:     397472 kB
>> SUnreclaim:     16434568 kB
> I think SUnreclaim is a key number to monitor over time.
>
Good to know.
>> What does SUnreclaimable mean?
> Slab is the memory used by the kernel's memory allocator.  Think of it
> as malloc(3) but with knowledge of the type of item that may be
> allocated.  Some of the slab allocations could be freed if there were
> other demands of memory: ‘memory pressure’.  That's SReclaimable.
> SUnreclaim is the amount allocated to things which must be kept no
> matter how high memory pressure goes.
>
>> This isn't the same on the NAS box, but either way there are two
>> problems to debug here and I guess they could be related.
> The desktop PC will be easier because you've more tools and more
> upstream parties interested in any report.  Does the NAS last the day?
> Schedule a nightly reboot, or kexec?
> https://wiki.archlinux.org/index.php/Kexec


Interestingly it seems okay today. Cache has filled up, but memory usage
is fairly stable at 110MB, though still no idea what's using it cos
apparently not any process. Previously I think it idled at 40MB. Meminfo
says "active" on the NAS box is about 50 MB, but no big slab or
SUnrclaim here. Perhaps "free" (busybox 1.20.2) is actually reporting
virtual memory here. This is something D-Link compiled so I couldn't be
surprised if they effed it up somehow, given the quality of the rest of
their software and system configuration. That the HDDs

>> Do I have a kernel memory leak?
> Probably.
Also good to know. My friend on totally different hardware (Intel CPU
and iGPU) also might be having this problem (but need to confirm it's
not cache), so it might not be a hardware-specific thing, or it might be
a very generic driver.
>
>>> ‘sudo slabtop -osc’ will give a breakdown.
> ...
>> Okay, that yields: http://ix.io/2x4T
>>
>> The total is much smaller than the number in /proc/meminfo (just
>> verified it hasn't changed drastically). Bizarre.
> That is odd.
>
> Are you using any VM stuff?  Any disk filesystems over than ext4,
> e.g. ZFS?  Nvidia graphics drivers?  Is this the machine where a kernel
> driver keeps dying?

No VMs running (but vbox installed). Mounted filesystems are ext4, vfat
for /boot/efi, one NTFS (but IIRC this is a userspace driver with FUSE),
and ecryptfs for home directory encryption. I also have encrypted swap
on a LUKS container.

Driver wise I'm using the open source AMD drivers (I have an RX 460).
Mint's driver manage confirms I have no proprietary drivers in use (why
I chose an AMD GPU, cos the Nvidia ones are a PITA in Linux). No kernel
issues as far as I know. As with the pi, ufw is filling up my syslog, so
perhaps I'll disable logging here too so I can see more easily.

Good idea to stop mounting the NTFS partition to see if that changes
anything?

>
> Monitor SUnreclaim at a regular time period, e.g. 30 seconds, so you can
> see it climbing.  You said you were doing a large upload.  If it's the
> kind which can recover from being stopped and re-started then see if
> your monitoring shows a steady climb during upload which stops if you
> kill the upload only to restart when you resume the upload.

Okay, results (every 30 seconds) are below. Before stopping upload (and
all associated pCloud processes):

SUnreclaim:     19781000 kB
SUnreclaim:     19782916 kB
SUnreclaim:     19784140 kB
SUnreclaim:     19785252 kB
SUnreclaim:     19787588 kB
SUnreclaim:     19788968 kB
SUnreclaim:     19790540 kB
SUnreclaim:     19792240 kB
SUnreclaim:     19794108 kB
SUnreclaim:     19796244 kB
SUnreclaim:     19796692 kB
SUnreclaim:     19798848 kB
SUnreclaim:     19801344 kB

After stopping (and ending all pCloud processes):

SUnreclaim:     19805016 kB
SUnreclaim:     19807172 kB
SUnreclaim:     19808868 kB
SUnreclaim:     19810428 kB
SUnreclaim:     19811948 kB
SUnreclaim:     19813980 kB
SUnreclaim:     19815524 kB

Looks about the same to me.

NB: When run w/ quicker interval I can see it hover up and down a little
bit, but on the whole increasing. Interestingly RAM usage hasn't
increased overnight as much as previous trends made me expect. Since
I've had the screens on this morning, memory usage has started to
balloon much quicker again... I do have multiple screens, perhaps
somehow related to that and/or to the amdgpu driver?//Although I also
had this on my old system w/ an Nvidia card. Hmm.

>> My swap is meant to be for an emergency, not because some leaky code
>> in a driver/the kernel/whatever is somehow managing to use 20gb of
>> ram.
> Swap isn't reserved as an overflow when RAM runs low.  Even when there
> is plenty of RAM free, the kernel might decide to swap out some memory
> which is not backed by another device because it thinks that memory
> would be better used by a cache.
That's true, but I know you know what I meant :) I had no swap at all
for a while, and then added it recently to try and get hibernate to work
(alas it didn't work). It's a shame TuxOnIce is dead, that always worked
a lot better (as in it worked maybe 50% of the time) for me.
>
> BTW, curl(1) did download something.  ;-)

I'll have to figure out how that command worked then :)

Thank you for all your help Ralph, definitely getting closer to
resolving the issues I think. Apologies for the ultra-long message!

Hamish

signature.asc
Description: OpenPGP digital signature

-- 
  Next meeting: Online, Jitsi, Tuesday, 2020-10-06 20:00
  Check to whom you are replying
  Meetings, mailing list, IRC, ...  http://dorset.lug.org.uk
  New thread, don't hijack:  mailto:dorset@mailman.lug.org.uk

Re: [Dorset] Very high memory usage (ignoring cache) after being powered on for days

Reply via email to