YutingZhang edited a comment on issue #13593: Low CPU usage of MXNet in subprocesses URL: https://github.com/apache/incubator-mxnet/issues/13593#issuecomment-450949556 @pengzhao-intel @TaoLv @anirudh2290 @zhreshold Thank you for everyone's help, and happy new year! This problem seems more complicated (it might be multiple problems in the beginning). @zhreshold's fix solved the problem in most cases. However, I found, if we call `asnumpy` in each worker, it interferes among the processes. And it does not seem to be a problem for GPU-version MxNet running on a GPU-machine. It seems only happening on **CPU-only machine (I tested on c5.18large with `mxnet-mkl`)**. Code (one-line difference): ``` import argparse import sys from concurrent import futures import time import numpy as np mx=None def run(need_import): if need_import: import mxnet as mx else: global mx A = mx.nd.random.uniform(low=0, high=1, shape=(5000, 5000)) while True: A = mx.nd.dot(A, A) A.asnumpy() # ******** only difference *********** def parse_args(): parser = argparse.ArgumentParser("benchmark mxnet cpu") parser.add_argument('--num-workers', '-j', dest='num_workers', type=int, default=0) parser.add_argument('--late-import', action='store_true') return parser.parse_args() def main(args): if args.num_workers == 0: print("Main process") try: run(need_import=args.late_import) except KeyboardInterrupt: pass else: print("Subprocesses") ex = futures.ProcessPoolExecutor(args.num_workers) for _ in range(args.num_workers): ex.submit(run, need_import=args.late_import) while True: try: time.sleep(10000) except KeyboardInterrupt: ex.shutdown(wait=False) break print("Stopped") if __name__ == "__main__": args = parse_args() if not args.late_import: import mxnet as mx main(args) ``` Launch 10 workers (`python3 mxnet_cpu_test.py --num-workers=10`). `MXNET_MP_WORKER_NTHREADS` does not affect the results.  But running it only in main process is fine:  By the way, another issue I found with `mxnet` (cpu non-mkl version) is: when you run MxNet in a subprocess, it interferes with many other non-mxnet functions (e.g., `cv2.cvtColor`). The subprocess got stuck at those functions. This did not happen for `mxnet==1.3.1`, it started to happen in some nightly build version. Probably, we should create a new ticket for this.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
