Ethanlm opened a new pull request #3366: URL: https://github.com/apache/storm/pull/3366
## What is the purpose of the change Enabled supervisors to use runc binary to launch workers inside an oci container. See docs/OCI-support.md for justifications. The original design and many parts of the oci related code is borrowed from Verizon Media hadoop-core team This has been used in our production Storm clusters for more than one year. ## How was the change tested Tested with an example WordCount topology, including following operations: 1. launch a topology and supervisor launches the worker inside the oci container ``` -bash-4.2$ sudo runc list ID PID STATUS BUNDLE CREATED OWNER 6703-1a23ca4b-6062-4d08-8ac3-b09e7d35e7cb 21780 running /home/y/var/storm/workers/1a23ca4b-6062-4d08-8ac3-b09e7d35e7cb 2020-12-21T20:21:17.116971511Z root ``` 2. manually kill the worker, supervisor recovers it 3. profile the worker from UI (jmap, heap, etc) ``` -bash-4.2$ sudo ls -ltrh /home/y/var/storm/workers-artifacts/wc1-2-1608581491/6703/ ... -rw-r----- 1 username1 gstorm 64K Dec 21 21:32 worker.log -rw-r----- 1 username1 gstorm 27K Dec 21 21:51 jstack-17-20201221215145.txt -rw-r----- 1 username1 gstorm 552M Dec 21 21:52 recording-17-20201221215222.bin ``` 4. Checked cpu throttling and validated it was properly enforced. In this example, this worker is assigned with 140% cpu. ``` -bash-4.2$ cat /sys/fs/cgroup/cpu/storm/6703-1a23ca4b-6062-4d08-8ac3-b09e7d35e7cb/cpu.cfs_period_us 100000 -bash-4.2$ cat /sys/fs/cgroup/cpu/storm/6703-1a23ca4b-6062-4d08-8ac3-b09e7d35e7cb/cpu.cfs_quota_us 140000 -bash-4.2$ cat /sys/fs/cgroup/cpu/storm/6703-1a23ca4b-6062-4d08-8ac3-b09e7d35e7cb/cpu.stat nr_periods 824 nr_throttled 821 throttled_time 104646268666 #Top output: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21800 username1 20 0 5453928 1.7g 28036 S 139.9 22.7 2:15.96 java 21780 username1 20 0 2451376 83324 27448 S 0.0 1.1 0:02.36 java ``` 5. kill the topology, so supervisor kills the worker ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org