Control: tags -1 + moreinfo

On Wed, Oct 31, 2018 at 05:21:39PM +0800, 段熊春 wrote:
> Package: linux-image-4.9.0-0.bpo.7-amd64
> Version: 4.9.110-3+deb9u2~deb8u1
> 
> Package: systemd
> Version: 230-7~bpo8+2
> 
> hi guys:
> We suspect that we may have found a memory leak bug in cgroup memory 
> subsystem, with 1GBytes/Hour leak speed for a special case.
> This bug could be reproduced 100% on the mainstream kernel version 4.19.   
> (Tried on Debian's latest kernel 4.14 and 4.9, the same result.)
> 
> This is what we have observed (Debian 9 Stretch, with mainstream kernel 
> version 4.19, kconfig attached) and how to reprocude:
> System with Cgroup enabled. A demo service which simulates an "ill" behavior: 
> program broken, and exit immediately after just startup:
> 
> service code
> #include "stdio.h"
> #include "stdlib.h"
> int main()
> {
>  void * p = malloc(10240);
>  return 1;
> }
> Compile the above code and put the binary as /usr/bin/test 
> systemd service
> [Service]
> ExecStart=/usr/bin/test
> Restart=always
> RestartSec=2s
> MemoryLimit=1G
> StartLimitInterval=0
> [Install]
> WantedBy=default.target
> Enable and start the above service with the tool systemctl.
> 
> Some additional information:
> With strace attach to systemd before start the service: systemd will mkdir 
> under /sys/fs/cgroup/memory for that service(/usr/bin/test). After the 
> service stops, rmdir will remove the correspond entry under 
> /sys/fs/cgrou/memory
> With kprobe hook to cgroup_mkdir and cgroup_rmdir: the number of call 
> cgroup_mkdir and cgroup_rmdir are equally.
> With kprobe hook to (1)mem_cgroup_css_alloc (2)mem_cgroup_css_free 
> (3)mem_cgroup_css_released (4)mem_cgroup_css_offline:
> the invoke number of mem_cgroup_css_alloc and mem_cgroup_css_offline are 
> equally  (Assume the number is A)
> the invoke number of alloc mem_cgroup_css_free and mem_cgroup_css_released 
> are equally (Assume the number is B)
> A > B
> With jprobe: we have collected some addresses of memcg. With the crash tool, 
> inspect the living kernel: the member named refcnt's flag in the memcg->css 
> is change to __PERCPU_REF_ATOMIC_DEAD.     memcg->css->refcnt->count  keeps 
> the same value as memcg->memory->count.  After 24 hours, we observed the data 
> structure is still in use, and the value of the two count both are 1.
> we wrote a kmod to put a memcg which counter is 1, nothing happen except this 
> struct has been free
> We suspect the issue maybe caused by incorrect call to  try_charge and 
> cancel_charge. Anyway, just guess.
> Following is some inspection code we used as described above:
[...]

Can you still reproduce this issue with a recent kernel from unstable
or buster-backports? What about mainline? if so can you please report
this upstream instead, and keep us downstream in the loop?

Regards,
Salvatore

Reply via email to