Jie Yu created MESOS-9127:
-----------------------------
Summary: Port mapper CNI plugin might deadlock iptables on the
agent.
Key: MESOS-9127
URL: https://issues.apache.org/jira/browse/MESOS-9127
Project: Mesos
Issue Type: Bug
Components: network
Affects Versions: 1.6.1, 1.5.1, 1.4.1
Reporter: Jie Yu
Recently, we noticed that if one launches a lot of containers that use the port
mapper CNI plugin, the iptables will get deadlock. The symptom is like the
following.
If you do any iptables command on the box, it'll get stuck on acquiring the
xtables lock:
{noformat}
core@ip-10-0-2-99 ~ $ time iptables -w -t nat -S UCR-DEFAULT-BRIDGE
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
^C
real 0m41.349s
user 0m0.001s
sys 0m0.000s
{noformat}
And you'll notice that a lot of iptables and mesos port mapper CNI plugin
processes on the box
{noformat}
$ ps -fp $(pidof mesos-cni-port-mapper) | wc -l
191
$ ps -fp $(pidof iptables) | wc -l
192
{noformat}
Then, we look into the process that is holding the xtables lock:
{noformat}
$ sudo netstat -p -n | grep xtables
unix 2 [ ] STREAM 225083 25048/iptables
@xtables
$ ps aux | grep 25048
root 25048 0.0 0.0 26184 2512 ? S 22:41 0:00 iptables -w -t
nat -S UCR-DEFAULT-BRIDGE
core 31857 0.0 0.0 6760 976 pts/0 S+ 23:18 0:00 grep
--colour=auto 25048
$ sudo strace -s 10000 -p 25048
Process 25048 attached
write(1, "-dport 13839 -m comment --comment \"container_id:
a074efef-c119-4764-a584-87c57e6b7f68\" -j DNAT --to-destination
172.31.254.183:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
17282 -m comment --comment \"container_id:
47880db5-90ad-4034-9c2b-2fd246d42342\" -j DNAT --to-destination
172.31.254.126:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
28712 -m comment --comment \"container_id:
3293d5d0-772c-48d2-bafd-1c4b4d56247e\" -j DNAT --to-destination
172.31.254.130:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
23893 -m comment --comment \"container_id:
f57de8eb-a3b9-44cb-8dac-7ab261bc8aac\" -j DNAT --to-destination
172.31.254.149:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
28449 -m comment --comment \"container_id:
9238dbf0-7b28-4fda-880a-bf0c8f40562a\" -j DNAT --to-destination
172.31.254.190:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
26438 -m comment --comment \"container_id:
d307cf58-8972-4de4-ad45-26c29786add0\" -j DNAT --to-destination
172.31.254.187:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
7682 -m comment --comment \"container_id:
60f5a61b-f4c0-4846-b2cd-63cd7eb5a4e8\" -j DNAT --to-destination
172.31.254.177:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
23904 -m comment --comment \"container_id:
f203ff9e-7b81-4e54-ab44-d45e2a937f38\" -j DNAT --to-destination
172.31.254.157:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
8359 -m comment --comment \"container_id:
578cc89c-83bf-46ba-9ae7-c7b89e40e739\" -j DNAT --to-destination
172.31.254.158:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
28482 -m comment --comment \"container_id:
70721adb-cd6c-4158-8b11-4ef694999203\" -j DNAT --to-destination
172.31.254.163:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
2564 -m comment --comment \"container_id:
70bc9735-8e0a-4b27-9a68-1b1151c9f6ec\" -j DNAT --to-destination
172.31.254.105:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
6952 -m comment --comment \"container_id:
a494e0fe-ce6e-4e95-b094-504c7603b6fb\" -j DNAT --to-destination
172.31.254.159:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
29371 -m comment --comment \"container_id:
1dac0183-8939-4165-95e8-cf6503cc53ad\" -j DNAT --to-destination
172.31.254.174:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
6551 -m comment --comment \"container_id:
148a197b-7247-4a3c-b528-1417cc8ccf3f\" -j DNAT --to-destination
172.31.254.144:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
16568 -m comment --comment \"container_id:
e68bb942-ab69-43a5-a308-40b33368d111\" -j DNAT --to-destination
172.31.254.184:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
5915 -m comment --comment \"container_id:
08d53ef6-7d3c-4b44-8bb7-923b6a4a1780\" -j DNAT --to-destination
172.31.254.186:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
11252 -m comment --comment \"container_id:
894e1455-6c77-4fc2-8414-53f3db7b492a\" -j DNAT --to-destination
172.31.254.223:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
7583 -m comment --comment \"container_id:
917b1fc3-12ca-4198-a5f4-13144fbf9096\" -j DNAT --to-destination
172.31.254.164:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
12695 -m comment --comment \"container_id:
cfadeb29-3481-4e9e-8eba-e57ce6a4f693\" -j DNAT --to-destination
172.31.254.153:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
1256 -m comment --comment \"container_id:
212649ee-a894-4201-bc38-798545ce9e1f\" -j DNAT --to-destination
172.31.254.132:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
28581 -m comment --comment \"container_id:
38c46524-9023-450d-a134-24ad14eecbd5\" -j DNAT --to-destination
172.31.254.202:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
5709 -m comment --comment \"container_id:
8a510a8e-df16-40b2-a4b7-36605a5f9984\" -j DNAT --to-destination
172.31.254.194:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport
6050 -m comment --comment \"container_id:
94757d96-b6ad-4a57-9e38-c956a2e24fd6\" -j DNAT --to-destination
172.31.254.152:80\n-A UCR-DEFAULT-B", 4096
{noformat}
So it gets stuck when writing to stdout. That indicates that the pipe is full,
and the reader end hasn't read from the pipe. Then we identified the reader
{noformat}
$ ps aux | grep a074efef-c119-4764-a584-87c57e6b7f68
core 2355 0.0 0.0 6756 1020 pts/0 S+ 23:39 0:00 grep
--colour=auto a074efef-c119-4764-a584-87c57e6b7f68
root 26486 0.0 0.0 11948 2852 ? S 22:41 0:00 sh -c
#!/bin/sh exec 1>&2 set -x # The iptables command searches
for the DNAT rules with tag # "container_id: <CNI_CONTAINERID>", and if
it exists goes ahead # and deletes it. iptables -w -t nat -S
UCR-DEFAULT-BRIDGE | sed "/container_id: a074efef-c119-4764-a584-87c57e6b7f68/
s/-A/iptables -w -t nat -D/e"
root 26490 0.0 0.0 4340 896 ? S 22:41 0:00 sed
/container_id: a074efef-c119-4764-a584-87c57e6b7f68/ s/-A/iptables -w -t nat
-D/e
$ sudo strace -s 10000 -p 26490
Process 26490 attached
read(0, ^CProcess 26490 detached
{noformat}
The reader which is `sed` is reading from stdin, somehow gets stuck as well.
The corresponding container gets stuck during CNI detach after checking the
agent log.
After checking the code, we pinpoint the problematic script in our CNI port
mapper plugin:
https://github.com/apache/mesos/blob/1.6.0/src/slave/containerizer/mesos/isolators/network/cni/plugins/port_mapper/port_mapper.cpp#L372
{code}
iptables -w -t nat -S %s | sed "/%s/ s/-A/iptables -w -t nat -D/e"
{code}
This command will deadlock if there are a lot of entires in the iptables
because the `sed` won't process the next line while executing `iptables -w -t
nat -D ...`. But the executing of `iptables -w -t nat -D ... ` might get stuck
if the first command `iptables -w -t nat -S %s` didn't finish, because the lock
is not released. The first command might not finish if it has a lot of output,
filling the pipe that `sed` hasn't had a chance to process yet.
The correct solution is to write the command to execute in a file, and execute
the file.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)