comaniac commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner
URL: https://github.com/apache/incubator-tvm/pull/4995#issuecomment-596008979
 
 
   > @comaniac When I was working on the Windows RPC server, my first design 
was to run the trials all in the same process as Windows doesn't `fork(...)`. 
It did indeed have a speed improvement.
   > 
   > After running CUDA autotune a while it looked like there was a GPU memory 
leak (some trial made it jump hundreds of megs). I spent a long time looking 
for a leak in the C++ code, but couldn't find anything. With my very limited 
understanding, it seemed as if the failed/slow cuda kernels it would test, were 
themselves leaking. Only killing the process would free the GPU memory.
   > 
   > I abandoned the in-process running of trials and moved to a similar design 
as the Linux current implementation.
   
   Thanks for the experience sharing. It sounds reasonable.
   The current implementation in this PR lets every executor forked process 
acquire a new CUDA context and this is the only solution I have right now to 
avoid some of the initialization errors. Ideally the process should be killed 
right after the child process finished so that the next job (with another new 
forked process) could start from a clean context. However, apparently it 
doesn't work as I expected.
   
   Related issue: 
https://stackoverflow.com/questions/22950047/cuda-initialization-error-after-fork
 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to