Ethanlm opened a new pull request #3366:
URL: https://github.com/apache/storm/pull/3366


   ## What is the purpose of the change
   
   Enabled supervisors to use runc binary to launch workers inside an oci 
container. See docs/OCI-support.md for justifications. The original design and 
many parts of the oci related code is borrowed from Verizon Media hadoop-core 
team 
   
   This has been used in our production Storm clusters for more than one year. 
   
   ## How was the change tested
   
   Tested with an example WordCount topology, including following operations:
   1. launch a topology and supervisor launches the worker inside the oci 
container
   ```
   -bash-4.2$ sudo runc list
   ID                                          PID         STATUS      BUNDLE   
                                                        CREATED                 
         OWNER
   6703-1a23ca4b-6062-4d08-8ac3-b09e7d35e7cb   21780       running     
/home/y/var/storm/workers/1a23ca4b-6062-4d08-8ac3-b09e7d35e7cb   
2020-12-21T20:21:17.116971511Z   root
   ```
   2. manually kill the worker, supervisor recovers it
   3. profile the worker from UI (jmap, heap, etc)
   ```
   -bash-4.2$ sudo ls -ltrh 
/home/y/var/storm/workers-artifacts/wc1-2-1608581491/6703/
   ...
   -rw-r----- 1 username1 gstorm  64K Dec 21 21:32 worker.log
   -rw-r----- 1 username1 gstorm  27K Dec 21 21:51 jstack-17-20201221215145.txt
   -rw-r----- 1 username1 gstorm 552M Dec 21 21:52 
recording-17-20201221215222.bin
   ```
   
   4. Checked cpu throttling and validated it was properly enforced. In this 
example, this worker is assigned with 140% cpu.
   ```
   -bash-4.2$ cat 
/sys/fs/cgroup/cpu/storm/6703-1a23ca4b-6062-4d08-8ac3-b09e7d35e7cb/cpu.cfs_period_us
   100000
   -bash-4.2$ cat 
/sys/fs/cgroup/cpu/storm/6703-1a23ca4b-6062-4d08-8ac3-b09e7d35e7cb/cpu.cfs_quota_us
   140000
   -bash-4.2$ cat 
/sys/fs/cgroup/cpu/storm/6703-1a23ca4b-6062-4d08-8ac3-b09e7d35e7cb/cpu.stat
   nr_periods 824
   nr_throttled 821
   throttled_time 104646268666
      
     #Top output:
     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
   21800 username1  20   0 5453928   1.7g  28036 S 139.9 22.7   2:15.96 java
   21780 username1  20   0 2451376  83324  27448 S   0.0  1.1   0:02.36 java
   ```
   
   5. kill the topology, so supervisor kills the worker
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to