On 11/09/2020 10:04, Ralph Corderoy wrote: > Hi Hamish, > >> NB: On desktop I seem to have a very high number for "SUnreclaim" in >> /proc/meminfo: >> >> MemTotal: 32812004 kB >> MemFree: 8619976 kB >> MemAvailable: 9572924 kB >> Buffers: 61772 kB >> Cached: 1061212 kB >> SwapCached: 1190212 kB > ... >> Slab: 16832040 kB >> SReclaimable: 397472 kB >> SUnreclaim: 16434568 kB > I think SUnreclaim is a key number to monitor over time. > Good to know. >> What does SUnreclaimable mean? > Slab is the memory used by the kernel's memory allocator. Think of it > as malloc(3) but with knowledge of the type of item that may be > allocated. Some of the slab allocations could be freed if there were > other demands of memory: ‘memory pressure’. That's SReclaimable. > SUnreclaim is the amount allocated to things which must be kept no > matter how high memory pressure goes. > >> This isn't the same on the NAS box, but either way there are two >> problems to debug here and I guess they could be related. > The desktop PC will be easier because you've more tools and more > upstream parties interested in any report. Does the NAS last the day? > Schedule a nightly reboot, or kexec? > https://wiki.archlinux.org/index.php/Kexec
Interestingly it seems okay today. Cache has filled up, but memory usage is fairly stable at 110MB, though still no idea what's using it cos apparently not any process. Previously I think it idled at 40MB. Meminfo says "active" on the NAS box is about 50 MB, but no big slab or SUnrclaim here. Perhaps "free" (busybox 1.20.2) is actually reporting virtual memory here. This is something D-Link compiled so I couldn't be surprised if they effed it up somehow, given the quality of the rest of their software and system configuration. That the HDDs >> Do I have a kernel memory leak? > Probably. Also good to know. My friend on totally different hardware (Intel CPU and iGPU) also might be having this problem (but need to confirm it's not cache), so it might not be a hardware-specific thing, or it might be a very generic driver. > >>> ‘sudo slabtop -osc’ will give a breakdown. > ... >> Okay, that yields: http://ix.io/2x4T >> >> The total is much smaller than the number in /proc/meminfo (just >> verified it hasn't changed drastically). Bizarre. > That is odd. > > Are you using any VM stuff? Any disk filesystems over than ext4, > e.g. ZFS? Nvidia graphics drivers? Is this the machine where a kernel > driver keeps dying? No VMs running (but vbox installed). Mounted filesystems are ext4, vfat for /boot/efi, one NTFS (but IIRC this is a userspace driver with FUSE), and ecryptfs for home directory encryption. I also have encrypted swap on a LUKS container. Driver wise I'm using the open source AMD drivers (I have an RX 460). Mint's driver manage confirms I have no proprietary drivers in use (why I chose an AMD GPU, cos the Nvidia ones are a PITA in Linux). No kernel issues as far as I know. As with the pi, ufw is filling up my syslog, so perhaps I'll disable logging here too so I can see more easily. Good idea to stop mounting the NTFS partition to see if that changes anything? > > Monitor SUnreclaim at a regular time period, e.g. 30 seconds, so you can > see it climbing. You said you were doing a large upload. If it's the > kind which can recover from being stopped and re-started then see if > your monitoring shows a steady climb during upload which stops if you > kill the upload only to restart when you resume the upload. Okay, results (every 30 seconds) are below. Before stopping upload (and all associated pCloud processes): SUnreclaim: 19781000 kB SUnreclaim: 19782916 kB SUnreclaim: 19784140 kB SUnreclaim: 19785252 kB SUnreclaim: 19787588 kB SUnreclaim: 19788968 kB SUnreclaim: 19790540 kB SUnreclaim: 19792240 kB SUnreclaim: 19794108 kB SUnreclaim: 19796244 kB SUnreclaim: 19796692 kB SUnreclaim: 19798848 kB SUnreclaim: 19801344 kB After stopping (and ending all pCloud processes): SUnreclaim: 19805016 kB SUnreclaim: 19807172 kB SUnreclaim: 19808868 kB SUnreclaim: 19810428 kB SUnreclaim: 19811948 kB SUnreclaim: 19813980 kB SUnreclaim: 19815524 kB Looks about the same to me. NB: When run w/ quicker interval I can see it hover up and down a little bit, but on the whole increasing. Interestingly RAM usage hasn't increased overnight as much as previous trends made me expect. Since I've had the screens on this morning, memory usage has started to balloon much quicker again... I do have multiple screens, perhaps somehow related to that and/or to the amdgpu driver?//Although I also had this on my old system w/ an Nvidia card. Hmm. >> My swap is meant to be for an emergency, not because some leaky code >> in a driver/the kernel/whatever is somehow managing to use 20gb of >> ram. > Swap isn't reserved as an overflow when RAM runs low. Even when there > is plenty of RAM free, the kernel might decide to swap out some memory > which is not backed by another device because it thinks that memory > would be better used by a cache. That's true, but I know you know what I meant :) I had no swap at all for a while, and then added it recently to try and get hibernate to work (alas it didn't work). It's a shame TuxOnIce is dead, that always worked a lot better (as in it worked maybe 50% of the time) for me. > > BTW, curl(1) did download something. ;-) I'll have to figure out how that command worked then :) Thank you for all your help Ralph, definitely getting closer to resolving the issues I think. Apologies for the ultra-long message! Hamish
signature.asc
Description: OpenPGP digital signature
-- Next meeting: Online, Jitsi, Tuesday, 2020-10-06 20:00 Check to whom you are replying Meetings, mailing list, IRC, ... http://dorset.lug.org.uk New thread, don't hijack: mailto:dorset@mailman.lug.org.uk