comaniac commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner URL: https://github.com/apache/incubator-tvm/pull/4995#issuecomment-596008979 > @comaniac When I was working on the Windows RPC server, my first design was to run the trials all in the same process as Windows doesn't `fork(...)`. It did indeed have a speed improvement. > > After running CUDA autotune a while it looked like there was a GPU memory leak (some trial made it jump hundreds of megs). I spent a long time looking for a leak in the C++ code, but couldn't find anything. With my very limited understanding, it seemed as if the failed/slow cuda kernels it would test, were themselves leaking. Only killing the process would free the GPU memory. > > I abandoned the in-process running of trials and moved to a similar design as the Linux current implementation. Thanks for the experience sharing. It sounds reasonable. The current implementation in this PR lets every executor forked process acquire a new CUDA context and this is the only solution I have right now to avoid some of the initialization errors. Ideally the process should be killed right after the child process finished so that the next job (with another new forked process) could start from a clean context. However, apparently it doesn't work as I expected. Related issue: https://stackoverflow.com/questions/22950047/cuda-initialization-error-after-fork
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
