Re: gomp_target_fini

Bernd Schmidt Thu, 21 Jan 2016 07:25:43 -0800

Thomas, I've mentioned this issue before - there is sometimes just toomuch irrelevant stuff to wade through in your patch submissions, and itdiscourages review. The discussion of the actual problem begins morethan halfway through your multi-page mail. Please try to be more concise.


On 12/16/2015 01:30 PM, Thomas Schwinge wrote:

Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
atexit handler, gomp_target_fini, which, with the device lock held, will
call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
clean up.


Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
context is now in an inconsistent state

Thus, any cuMemFreeHost invocations that are run during clean-up will now
also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call
GOMP_PLUGIN_fatal, which again will trigger the same or another
(GOMP_offload_unregister_ver) atexit handler, which will then deadlock
trying to lock the device again, which is still locked.

        libgomp/
        * error.c (gomp_vfatal): Call _exit instead of exit.

It seems unfortunate to disable the atexit handlers for everything forwhat seems purely an nvptx problem.

What exactly happens if you don't register the cleanups with atexit inthe first place? Or maybe you can query for CUDA_ERROR_LAUNCH_FAILED inthe cleanup functions?



Bernd

Re: gomp_target_fini

Reply via email to