[GitHub] reminisce commented on a change in pull request #8371: Add note in the doc for using naive engine in multithreading environment

GitBox Sat, 21 Oct 2017 13:13:58 -0700

reminisce commented on a change in pull request #8371: Add note in the doc for 
using naive engine in multithreading environment
URL: https://github.com/apache/incubator-mxnet/pull/8371#discussion_r146114557


 ##########
 File path: docs/faq/env_var.md
 ##########
 @@ -56,6 +56,9 @@ export MXNET_GPU_WORKER_NTHREADS=3
     - NaiveEngine: A very simple engine that uses the master thread to do the 
computation synchronously. Setting this engine disables multi-threading. You 
can use this type for debugging in case of any error. Backtrace will give you 
the series of calls that lead to the error. Remember to set MXNET_ENGINE_TYPE 
back to empty after debugging.
     - ThreadedEngine: A threaded engine that uses a global thread pool to 
schedule jobs.
     - ThreadedEnginePerDevice: A threaded engine that allocates thread per GPU 
and executes jobs asynchronously.
+  - Note: ThreadedEngine and ThreadedEnginePerDevice are not thread-safe. 
Switch to using NaiveEngine
+          if you want to have multiple threads interacting with a single MXNet 
model at the same time.
 
 Review comment:
   Good suggestion and example. But I'm not following the reasoning of the last 
sentence `Since the fork of a process replicates the complete process address 
space including threads, keeping MXNet single-threaded via the use of 
NaiveEngine makes it safe.`
   
   The ThreadedEngine of MXNet schedules operations based upon the availability 
of NDArrays in their corresponding operation functions. The root cause for the 
problem we saw is that multiple threads pushes operations on NDArrays to the 
MXNet ThreadedEngine simultaneously. This will cause
   1. data race, e.g. thread1 and thread2 writing to the same NDArray memory 
space.
   2. dead lock, e.g. thread1 trying to copy from NDArray1 to NDArray2, while 
thread2 trying to copy from NDArray2 to NDArray1, and they are both waiting for 
the read on the source arrays ready.
   
   If it's already multiple threads in a single process interacting with MXNet, 
does it matter by forking it or not?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] reminisce commented on a change in pull request #8371: Add note in the doc for using naive engine in multithreading environment

Reply via email to