Hi folks,

I think I have come across a memory leak in uprobes, which is fairly easy to
reproduce.

Using a repeated use of a single uprobe, I am able to run my system out of
memory, resulting in system failures and a need to reboot the box.  When I
first became aware of the failure, I was able to narrow down the first
userspace-visible failure to -ENOMEM being returned from
kernel/events/uprobes.c:uprobe_write_opcode(), though I can't say whether that
is the actual cause of the leak or just a victim. With the below reproduction
steps, hopefully someone smarter than me can understand the problem and propose
a solution, which I would be happy to test.

So far, I have reproduced the issue on kernels 4.4 and 4.8-rc2. I've tried on
distributions from Ubuntu as well as Arch Linux. Continue reading for a rough
log of the steps to reproduce.

Thanks,
Brenden

----------------------------------------

root@localhost# uname -r
4.8.0-040800rc2-generic
root@localhost# cd /sys/kernel/debug/tracing
root@localhost# echo "p:uprobes/libc_so_6 
/lib/x86_64-linux-gnu/libc.so.6:0x0000000000086560" > uprobe_events
root@localhost# # number of iterations before failure depends on the amount of 
memory in your system
root@localhost# for i in {1..48}; do echo 1 > events/uprobes/libc_so_6/enable; 
echo 0 > events/uprobes/libc_so_6/enable; done

root@localhost# dmesg
[ 2182.439038] ------------[ cut here ]------------
[ 2182.439042] WARNING: CPU: 6 PID: 4763 at 
/home/kernel/COD/linux/mm/page_counter.c:26 page_counter_uncharge+0x1d/0x30
[ 2182.439043] Modules linked in: fuse binfmt_misc snd_hda_codec_generic 
snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep ppdev snd_pcm parport_pc 
snd_timer sg virtio_balloon virtio_console acpi_cpufreq parport serio_raw 
i2c_piix4 snd soundcore tpm_tis intel_powerclamp tpm_tis_core intel_rapl_perf 
evdev squashfs tpm loop ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp 
libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 ext4 crc16 jbd2 fscrypto 
mbcache btrfs dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath 
linear md_mod sd_mod ata_generic 8139too crct10dif_pclmul crc32_pclmul 
crc32c_intel ghash_clmulni_intel qxl aesni_intel aes_x86_64 lrw ttm virtio_pci 
ehci_pci uhci_hcd drm_kms_helper ehci_hcd ata_piix
[ 2182.439067]  gf128mul psmouse virtio_ring glue_helper libata ablk_helper 
8139cp scsi_mod cryptd mii usbcore usb_common floppy virtio drm button
[ 2182.439074] CPU: 6 PID: 4763 Comm: bash Not tainted 4.8.0-040800rc2-generic 
#201608142332
[ 2182.439074] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20160701_074356-anatol 04/01/2014
[ 2182.439075]  0000000000000286 00000000d2708db2 ffffffff9ad3de94 
0000000000000000
[ 2182.439076]  0000000000000000 ffffffff9aa7f4ee ffff8be0e749e8c0 
000000000000001f
[ 2182.439077]  ffff8be0e749e800 000000000000001f ffff8be0f4647bb0 
ffff8be0e71860c0
[ 2182.439078] Call Trace:
[ 2182.439081]  [<ffffffff9ad3de94>] ? dump_stack+0x5c/0x78
[ 2182.439083]  [<ffffffff9aa7f4ee>] ? __warn+0xbe/0xe0
[ 2182.439084]  [<ffffffff9abff52d>] ? page_counter_uncharge+0x1d/0x30
[ 2182.439085]  [<ffffffff9ac01512>] ? drain_stock.isra.30+0x32/0xa0
[ 2182.439086]  [<ffffffff9ac023f1>] ? try_charge+0x5f1/0x6b0
[ 2182.439087]  [<ffffffff9abe610b>] ? alloc_pages_vma+0xbb/0x240
[ 2182.439088]  [<ffffffff9ac06157>] ? mem_cgroup_try_charge+0x67/0x1b0
[ 2182.439090]  [<ffffffff9ab8156c>] ? uprobe_write_opcode+0x25c/0x5e0
[ 2182.439091]  [<ffffffff9ab81965>] ? 
install_breakpoint.isra.21.part.22+0x55/0x70
[ 2182.439092]  [<ffffffff9ab81d0d>] ? register_for_each_vma+0x36d/0x420
[ 2182.439092]  [<ffffffff9ab8201e>] ? uprobe_register+0x16e/0x290
[ 2182.439094]  [<ffffffff9ab65aef>] ? probe_event_enable+0xef/0x350
[ 2182.439096]  [<ffffffff9ab51cdc>] ? __ftrace_event_enable_disable+0x5c/0x1e0
[ 2182.439097]  [<ffffffff9ab52645>] ? event_enable_write+0x95/0xe0
[ 2182.439098]  [<ffffffff9ac0ebc3>] ? __vfs_write+0x33/0x160
[ 2182.439101]  [<ffffffff9ac2e49e>] ? __fd_install+0x2e/0xe0
[ 2182.439102]  [<ffffffff9ac2e313>] ? __alloc_fd+0x43/0x170
[ 2182.439104]  [<ffffffff9acacf78>] ? security_file_permission+0x38/0xb0
[ 2182.439105]  [<ffffffff9ac0f2c0>] ? vfs_write+0xb0/0x190
[ 2182.439106]  [<ffffffff9ac10692>] ? SyS_write+0x52/0xc0
[ 2182.439107]  [<ffffffff9ac2ea65>] ? SyS_dup2+0x95/0x100
[ 2182.439109]  [<ffffffff9b01d8b6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
[ 2182.439110] ---[ end trace 320efb00be7bb830 ]---

After this, continued toggling of the uprobe will eventually fail.

root@localhost# for i in {1..100}; do echo 1 > events/uprobes/libc_so_6/enable; 
[[ $? -ne 0 ]] && break; echo 0 > events/uprobes/libc_so_6/enable; done
-bash: echo: write error: Cannot allocate memory

After running the uprobes out of memory, oom-killer started to pick off system 
processes:

[   97.334378] acpid invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), 
order=0, oom_score_adj=0
[   97.334379] acpid cpuset=/ mems_allowed=0
[   97.334383] CPU: 4 PID: 4179 Comm: acpid Tainted: G        W       
4.8.0-040800rc2-generic #201608142332
[   97.334384] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20160701_074356-anatol 04/01/2014
[   97.334385]  0000000000000286 000000006b30ae78 ffffffffa673de94 
ffff9522a946be08
[   97.334387]  ffff9522a9464640 ffffffffa660b716 ffff9522a946bd78 
ffff9522a9453cc0
[   97.334388]  0000000000030080 ffffffffa658c0e6 0007ffffffffffff 
ffff9522a9464640
[   97.334390] Call Trace:
[   97.334395]  [<ffffffffa673de94>] ? dump_stack+0x5c/0x78
[   97.334397]  [<ffffffffa660b716>] ? dump_header+0x59/0x1dc
[   97.334398]  [<ffffffffa658c0e6>] ? find_lock_task_mm+0x36/0x80
[   97.334400]  [<ffffffffa658ccb2>] ? oom_kill_process+0x222/0x3e0
[   97.334402]  [<ffffffffa6602910>] ? mem_cgroup_iter+0x100/0x2e0
[   97.334403]  [<ffffffffa6604c2d>] ? mem_cgroup_out_of_memory+0x28d/0x2d0
[   97.334405]  [<ffffffffa660575e>] ? mem_cgroup_oom_synchronize+0x32e/0x340
[   97.334406]  [<ffffffffa6600ab0>] ? memory_high_write+0xd0/0xd0
[   97.334408]  [<ffffffffa658d30d>] ? pagefault_out_of_memory+0x4d/0xc0
[   97.334410]  [<ffffffffa6a1ec88>] ? async_page_fault+0x28/0x30
[   97.334411] Task in /system.slice/acpid.service killed as a result of limit 
of /system.slice/acpid.service
[   97.334414] memory: usage 18446744073709551276kB, limit 9007199254740988kB, 
failcnt 121
[   97.334414] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[   97.334415] kmem: usage 52kB, limit 9007199254740988kB, failcnt 0
[   97.334415] Memory cgroup stats for /system.slice/acpid.service: cache:0KB 
rss:0KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB 
active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[   97.334422] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents 
oom_score_adj name
[   97.334637] [ 4179]     0  4179     1100      287       7       3       37   
          0 acpid
[   97.334651] Memory cgroup out of memory: Kill process 4179 (acpid) score 0 
or sacrifice child
[   97.334695] Killed process 4179 (acpid) total-vm:4400kB, anon-rss:0kB, 
file-rss:1148kB, shmem-rss:0kB

Reply via email to