Hi,

I have an issue with a machine where I'm not able to detect the real root cause. It hangs up totally. It seems like it was running out of memory - but why? Hopefully somebody can give me some insight. As far I can see right now, it hangs up a few hours after an `emerge --update --newuse --deep --with-bdeps=y @world`.

The machine is an Intel Atom with 8 GB RAM (physical, max) and 24 GB swap (a file). So 32 GB RAM in total. It has a 250GB SSD. It runs gentoo-sources-4.14.83 build with genkernel. Portage uses the stable tree only. It basically provides the hardware for a qemu VM which does the network management: primary ns, dhcp, apache ssl proxy. This VM uses 4 GB RAM and has 8 GB swap file. The VM works smoothly. The atom machine itself acts further as basic nfs server to an independent dedicated server - which does the (re)exports - and as secondary ns. For this I'm convinced that 32GB RAM total have to be enough - correct me if I'm wrong!

The issue starts round about in February (IIRC). The update of gcc-10.2 did fail. I have /var/tmp/portage on tmpfs - I did increase the size in fstab from 8 to 16 GB. Afterwards gcc build successfully.

After two hang-ups I did increase the swap from 8 to 24 GB. It doesn't help. Here is a complete log from /var/log/messages:

Apr 28 05:35:57 Syrin kernel: [1454017.499919] isc-net-0000 invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null), order=0, oom_score_adj=0 Apr 28 05:35:57 Syrin kernel: [1454017.499925] isc-net-0000 cpuset=/ mems_allowed=0 Apr 28 05:35:57 Syrin kernel: [1454017.499933] CPU: 0 PID: 27685 Comm: isc-net-0000 Not tainted 4.14.83-gentoo #1 Apr 28 05:35:57 Syrin kernel: [1454017.499935] Hardware name: MSI MS-7877/J1900I, BIOS V1.2 03/25/2014
Apr 28 05:35:57 Syrin kernel: [1454017.499936] Call Trace:
Apr 28 05:35:57 Syrin kernel: [1454017.499948]  dump_stack+0x67/0x98
Apr 28 05:35:57 Syrin kernel: [1454017.499954]  dump_header+0x94/0x20c
Apr 28 05:35:57 Syrin kernel: [1454017.499958] oom_kill_process+0x24a/0x420 Apr 28 05:35:57 Syrin kernel: [1454017.499962] ? oom_badness.part.9+0xd3/0x150
Apr 28 05:35:57 Syrin kernel: [1454017.499965]  out_of_memory+0xf9/0x290
Apr 28 05:35:57 Syrin kernel: [1454017.499968] __alloc_pages_nodemask+0xf48/0xff0 Apr 28 05:35:57 Syrin kernel: [1454017.499974] filemap_fault+0x294/0x4c0 Apr 28 05:35:57 Syrin kernel: [1454017.499979] ext4_filemap_fault+0x2c/0x40
Apr 28 05:35:57 Syrin kernel: [1454017.499983]  __do_fault+0x1f/0xb0
Apr 28 05:35:57 Syrin kernel: [1454017.499986] __handle_mm_fault+0x3ed/0xad0 Apr 28 05:35:57 Syrin kernel: [1454017.499991] handle_mm_fault+0xaa/0x1f0 Apr 28 05:35:57 Syrin kernel: [1454017.499996] __do_page_fault+0x250/0x4f0
Apr 28 05:35:57 Syrin kernel: [1454017.500000]  ? page_fault+0x2f/0x50
Apr 28 05:35:57 Syrin kernel: [1454017.500003]  page_fault+0x45/0x50
Apr 28 05:35:57 Syrin kernel: [1454017.500005] RIP: 0000: (null) Apr 28 05:35:57 Syrin kernel: [1454017.500007] RSP: 12f83750:0000000000000001 EFLAGS: 7ffa12f837a0
Apr 28 05:35:57 Syrin kernel: [1454017.500010] Mem-Info:
Apr 28 05:35:57 Syrin kernel: [1454017.500017] active_anon:1694713 inactive_anon:211859 isolated_anon:0 Apr 28 05:35:57 Syrin kernel: [1454017.500017] active_file:328 inactive_file:344 isolated_file:32 Apr 28 05:35:57 Syrin kernel: [1454017.500017] unevictable:1374 dirty:0 writeback:0 unstable:0 Apr 28 05:35:57 Syrin kernel: [1454017.500017] slab_reclaimable:4480 slab_unreclaimable:7449 Apr 28 05:35:57 Syrin kernel: [1454017.500017] mapped:1071 shmem:3 pagetables:16352 bounce:0 Apr 28 05:35:57 Syrin kernel: [1454017.500017] free:11655 free_pcp:534 free_cma:0 Apr 28 05:35:57 Syrin kernel: [1454017.500021] Node 0 active_anon:6778852kB inactive_anon:847436kB active_file:1312kB inactive_file:1376kB unevictable:5496kB isolated(anon):0kB isolated(file):128kB mapped:4284kB dirty:0kB writeback:0kB shmem:12kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes Apr 28 05:35:57 Syrin kernel: [1454017.500026] DMA free:15836kB min:20kB low:32kB high:44kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15920kB managed:15836kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Apr 28 05:35:57 Syrin kernel: [1454017.500027] lowmem_reserve[]: 0 2664 7647 7647 Apr 28 05:35:57 Syrin kernel: [1454017.500036] DMA32 free:23732kB min:3892kB low:6620kB high:9348kB active_anon:2319992kB inactive_anon:363260kB active_file:0kB inactive_file:92kB unevictable:0kB writepending:0kB present:2825512kB managed:2734888kB mlocked:0kB kernel_stack:140kB pagetables:22572kB bounce:0kB free_pcp:1136kB local_pcp:476kB free_cma:0kB Apr 28 05:35:57 Syrin kernel: [1454017.500037] lowmem_reserve[]: 0 0 4982 4982 Apr 28 05:35:57 Syrin kernel: [1454017.500045] Normal free:7052kB min:7284kB low:12384kB high:17484kB active_anon:4458860kB inactive_anon:484176kB active_file:1368kB inactive_file:1568kB unevictable:5496kB writepending:0kB present:5242880kB managed:5102420kB mlocked:5496kB kernel_stack:2276kB pagetables:42836kB bounce:0kB free_pcp:1000kB local_pcp:724kB free_cma:0kB
Apr 28 05:35:57 Syrin kernel: [1454017.500046] lowmem_reserve[]: 0 0 0 0
Apr 28 05:35:57 Syrin kernel: [1454017.500050] DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15836kB Apr 28 05:35:57 Syrin kernel: [1454017.500070] DMA32: 268*4kB (UE) 1562*8kB (UE) 645*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 23888kB Apr 28 05:35:57 Syrin kernel: [1454017.500084] Normal: 293*4kB (UME) 490*8kB (UME) 152*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7524kB Apr 28 05:35:57 Syrin kernel: [1454017.500100] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Apr 28 05:35:57 Syrin kernel: [1454017.500101] 1818 total pagecache pages
Apr 28 05:35:57 Syrin kernel: [1454017.500110] 86 pages in swap cache
Apr 28 05:35:57 Syrin kernel: [1454017.500112] Swap cache stats: add 13659627, delete 13659513, find 1129210/1793572
Apr 28 05:35:57 Syrin kernel: [1454017.500113] Free swap  = 0kB
Apr 28 05:35:57 Syrin kernel: [1454017.500114] Total swap = 25165820kB
Apr 28 05:35:57 Syrin kernel: [1454017.500115] 2021078 pages RAM
Apr 28 05:35:57 Syrin kernel: [1454017.500116] 0 pages HighMem/MovableOnly
Apr 28 05:35:57 Syrin kernel: [1454017.500117] 57792 pages reserved
Apr 28 05:35:57 Syrin kernel: [1454017.500118] 0 pages hwpoisoned
Apr 28 05:35:57 Syrin kernel: [1454017.500119] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name Apr 28 05:35:57 Syrin kernel: [1454017.500125] [ 4009] 0 4009 21207 320 7 3 20 0 apcupsd Apr 28 05:35:57 Syrin kernel: [1454017.500128] [ 4043] 0 4043 54371 48 12 3 220 0 rsyslogd Apr 28 05:35:57 Syrin kernel: [1454017.500131] [ 4084] 0 4084 1938 178 6 3 18 0 fcron Apr 28 05:35:57 Syrin kernel: [1454017.500133] [ 4400] 0 4400 17733 1376 8 3 0 0 ntpd Apr 28 05:35:57 Syrin kernel: [1454017.500136] [ 4429] 0 4429 2789 0 7 3 103 0 rsync Apr 28 05:35:57 Syrin kernel: [1454017.500139] [ 4460] 0 4460 1689 241 6 3 112 -1000 sshd Apr 28 05:35:57 Syrin kernel: [1454017.500142] [ 4693] 0 4693 1067352 80035 1546 7 664044 0 qemu-system-x86 Apr 28 05:35:57 Syrin kernel: [1454017.500145] [ 4863] 0 4863 1905 465 7 3 43 0 agetty Apr 28 05:35:57 Syrin kernel: [1454017.500148] [ 4864] 0 4864 1905 433 7 3 44 0 agetty Apr 28 05:35:57 Syrin kernel: [1454017.500151] [ 4865] 0 4865 1905 431 7 3 43 0 agetty Apr 28 05:35:57 Syrin kernel: [1454017.500153] [ 4866] 0 4866 1905 443 7 3 43 0 agetty Apr 28 05:35:57 Syrin kernel: [1454017.500156] [ 4867] 0 4867 1905 453 7 3 43 0 agetty Apr 28 05:35:57 Syrin kernel: [1454017.500159] [ 4868] 0 4868 1905 433 8 3 43 0 agetty Apr 28 05:35:57 Syrin kernel: [1454017.500162] [27439] 0 27439 675 295 5 3 60 0 rpcbind Apr 28 05:35:57 Syrin kernel: [1454017.500164] [27509] 0 27509 750 419 5 3 80 0 rpc.idmapd Apr 28 05:35:57 Syrin kernel: [1454017.500167] [27520] 65534 27520 693 421 5 3 53 0 rpc.statd Apr 28 05:35:57 Syrin kernel: [1454017.500170] [27583] 0 27583 819 365 5 3 108 0 rpc.mountd Apr 28 05:35:57 Syrin kernel: [1454017.500173] [27684] 40 27684 7570889 1823291 14584 32 5626031 0 named Apr 28 05:35:57 Syrin kernel: [1454017.500176] [10479] 0 10479 3262 449 7 3 182 0 udevd Apr 28 05:35:57 Syrin kernel: [1454017.500179] [ 2923] 0 2923 1938 318 6 3 10 0 fcron Apr 28 05:35:57 Syrin kernel: [1454017.500182] [ 2924] 0 2924 981 406 5 3 0 0 backup.cron Apr 28 05:35:57 Syrin kernel: [1454017.500184] [ 2930] 0 2930 981 462 5 3 0 0 rsync.sh Apr 28 05:35:57 Syrin kernel: [1454017.500187] [ 2965] 0 2965 13185 1261 30 3 0 0 rsync Apr 28 05:35:57 Syrin kernel: [1454017.500190] [ 2966] 0 2966 574 158 5 3 0 0 tee Apr 28 05:35:57 Syrin kernel: [1454017.500192] [ 2968] 0 2968 2038 482 7 3 0 0 rsync Apr 28 05:35:57 Syrin kernel: [1454017.500195] [ 2974] 0 2974 14142 1189 31 3 0 0 rsync Apr 28 05:35:57 Syrin kernel: [1454017.500198] [ 2995] 0 2995 1938 271 6 3 7 0 fcron Apr 28 05:35:57 Syrin kernel: [1454017.500201] [ 2996] 0 2996 981 68 6 3 0 0 sh Apr 28 05:35:57 Syrin kernel: [1454017.500203] Out of memory: Kill process 27684 (named) score 904 or sacrifice child Apr 28 05:35:57 Syrin kernel: [1454017.500234] Killed process 27684 (named) total-vm:30283556kB, anon-rss:7293164kB, file-rss:0kB, shmem-rss:0kB Apr 28 05:36:00 Syrin kernel: [1454019.937636] oom_reaper: reaped process 27684 (named), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

All the processes after udevd are just from this day (backup job).

For comparison, the last call trace before was nearly the same:

Apr 15 18:26:37 Syrin kernel: [376620.710330] Call Trace:
Apr 15 18:26:37 Syrin kernel: [376620.710341]  dump_stack+0x67/0x98
Apr 15 18:26:37 Syrin kernel: [376620.710347]  dump_header+0x94/0x20c
Apr 15 18:26:37 Syrin kernel: [376620.710352] oom_kill_process+0x24a/0x420 Apr 15 18:26:37 Syrin kernel: [376620.710355] ? oom_badness.part.9+0xd3/0x150
Apr 15 18:26:37 Syrin kernel: [376620.710358]  out_of_memory+0xf9/0x290
Apr 15 18:26:37 Syrin kernel: [376620.710361] __alloc_pages_nodemask+0xf48/0xff0
Apr 15 18:26:37 Syrin kernel: [376620.710367]  filemap_fault+0x294/0x4c0
Apr 15 18:26:37 Syrin kernel: [376620.710372] ext4_filemap_fault+0x2c/0x40
Apr 15 18:26:37 Syrin kernel: [376620.710376]  __do_fault+0x1f/0xb0
Apr 15 18:26:37 Syrin kernel: [376620.710380] __handle_mm_fault+0x3ed/0xad0 Apr 15 18:26:37 Syrin kernel: [376620.710385] handle_mm_fault+0xaa/0x1f0 Apr 15 18:26:37 Syrin kernel: [376620.710390] __do_page_fault+0x250/0x4f0
Apr 15 18:26:37 Syrin kernel: [376620.710394]  ? page_fault+0x2f/0x50
Apr 15 18:26:37 Syrin kernel: [376620.710396]  page_fault+0x45/0x50
Apr 15 18:26:37 Syrin kernel: [376620.710400] RIP: 21e50088:0x7f41aedee150 Apr 15 18:26:37 Syrin kernel: [376620.710401] RSP: 21e50070:0000000000000000 EFLAGS: 7f41aedee150
Apr 15 18:26:37 Syrin kernel: [376620.710405] Mem-Info:
Apr 15 18:26:37 Syrin kernel: [376620.710411] active_anon:1695970 inactive_anon:212020 isolated_anon:0 Apr 15 18:26:37 Syrin kernel: [376620.710411] active_file:339 inactive_file:320 isolated_file:0 Apr 15 18:26:37 Syrin kernel: [376620.710411] unevictable:1374 dirty:0 writeback:0 unstable:0 Apr 15 18:26:37 Syrin kernel: [376620.710411] slab_reclaimable:4311 slab_unreclaimable:7552 Apr 15 18:26:37 Syrin kernel: [376620.710411] mapped:1059 shmem:3 pagetables:16264 bounce:0 Apr 15 18:26:37 Syrin kernel: [376620.710411] free:11840 free_pcp:0 free_cma:0

Unfortunately I don't understand all the details. Any help is highly appreciated.

I assume it has something to do with tmpfs which will not be freed. Just an assumption, I'm searching for clarity, not try and error.

Thanks
Kai



Reply via email to