reminisce commented on a change in pull request #8371: Add note in the doc for using naive engine in multithreading environment URL: https://github.com/apache/incubator-mxnet/pull/8371#discussion_r146114557
########## File path: docs/faq/env_var.md ########## @@ -56,6 +56,9 @@ export MXNET_GPU_WORKER_NTHREADS=3 - NaiveEngine: A very simple engine that uses the master thread to do the computation synchronously. Setting this engine disables multi-threading. You can use this type for debugging in case of any error. Backtrace will give you the series of calls that lead to the error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging. - ThreadedEngine: A threaded engine that uses a global thread pool to schedule jobs. - ThreadedEnginePerDevice: A threaded engine that allocates thread per GPU and executes jobs asynchronously. + - Note: ThreadedEngine and ThreadedEnginePerDevice are not thread-safe. Switch to using NaiveEngine + if you want to have multiple threads interacting with a single MXNet model at the same time. Review comment: Good suggestion and example. But I'm not following the reasoning of the last sentence `Since the fork of a process replicates the complete process address space including threads, keeping MXNet single-threaded via the use of NaiveEngine makes it safe.` The ThreadedEngine of MXNet schedules operations based upon the availability of NDArrays in their corresponding operation functions. The root cause for the problem we saw is that multiple threads pushes operations on NDArrays to the MXNet ThreadedEngine simultaneously. This will cause 1. data race, e.g. thread1 and thread2 writing to the same NDArray memory space. 2. dead lock, e.g. thread1 trying to copy from NDArray1 to NDArray2, while thread2 trying to copy from NDArray2 to NDArray1, and they are both waiting for the read on the source arrays ready. If it's already multiple threads in a single process interacting with MXNet, does it matter by forking it or not? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services