On 07/25/2018 09:52 PM, Andrew Morton wrote:
> (switched to email. Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Wed, 25 Jul 2018 11:42:57 +0000 [email protected] wrote:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=200651
>>
>> Bug ID: 200651
>> Summary: cgroups iptables-restor: vmalloc: allocation failure
>
> Thanks. Please do note the above request.
>
>> Product: Memory Management
>> Version: 2.5
>> Kernel Version: 4.14
>> Hardware: All
>> OS: Linux
>> Tree: Mainline
>> Status: NEW
>> Severity: normal
>> Priority: P1
>> Component: Other
>> Assignee: [email protected]
>> Reporter: [email protected]
>> Regression: No
>>
>> Created attachment 277505
>> --> https://bugzilla.kernel.org/attachment.cgi?id=277505&action=edit
>> iptables save
>>
>> After creating large number of cgroups and under memory pressure, iptables
>> command fails with following error:
>>
>> "iptables-restor: vmalloc: allocation failure, allocated 3047424 of 3465216
>> bytes, mode:0x14010c0(GFP_KERNEL|__GFP_NORETRY), nodemask=(null)"
This is likely the kvmalloc() in xt_alloc_table_info(). Between 4.13 and
4.17 it shouldn't use __GFP_NORETRY, but looks like commit 0537250fdc6c
("netfilter: x_tables: make allocation less aggressive") was backported
to 4.14. Removing __GFP_NORETRY might help here, but bring back other
issues. Less than 4MB is not that much though, maybe find some "sane"
limit and use __GFP_NORETRY only above that?
> I'm not sure what the problem is here, apart from iptables being
> over-optimistic about vmalloc()'s abilities.
>
> Are cgroups having any impact on this, or is it simply vmalloc arena
> fragmentation, and the iptables code should use some data structure
> more sophisticated than a massive array?
>
> Maybe all that ccgroup metadata is contributing to the arena
> fragmentation, but that allocations will be small and the two systems
> should be able to live alongside, by being realistic about vmalloc.
>
>> System which is used to reproduce the bug is with 2 vcpus and 2GB of ram, but
>> it happens on more powerfull systems.
>>
>> Steps to reproduce:
>>
>> mkdir /cgroup
>> mount cgroup -t cgroup -omemory,pids,blkio,cpuacct /cgroup
>> for a in `seq 1 1000`; do for b in `seq 1 4` ; do mkdir -p
>> "/cgroup/user/$a/$b"; done; done
>>
>> Then in separate consoles
>>
>> cat /dev/vda > /dev/null
>> ./test
>> ./test
>> i=0;while sleep 0 ; do iptables-restore < iptables.save ; i=$(($i+1)); echo
>> $i;
>> done
>>
>> Here is the source of "test" program and attached iptables.save. It happens
>> also with smaller iptables.save file.
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>>
>> int main(void) {
>>
>> srand(time(NULL));
>> int i = 0, j = 0, randnum=0;
>> int arr[6] = { 3072, 7168, 15360 , 31744, 64512, 130048};
>> while(1) {
>>
>> for (i = 0; i < 6 ; i++) {
>>
>> int *ptr = (int*) malloc(arr[i] * 93);
>>
>> for(j = 0 ; j < arr[i] * 93 / sizeof(int); j++) {
>> *(ptr+j) = j+1;
>> }
>>
>> free(ptr);
>> }
>> }
>> }
>>
>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html