On 2015/9/25 上午 04:27, Ilya Verbin wrote:
> On Thu, Aug 27, 2015 at 21:44:50 +0800, Chung-Lin Tang wrote:
>> We've discovered that, for several of the libgomp plugin interface routines,
>> if the target specific routine calls exit() (usually upon a fatal condition),
>> deadlock ensues. We found this using nvptx, but it's possible on intelmic as 
>> well.
>>
>> This is due to many of the plugin routines are called with the device lock 
>> held,
>> and when exit() is called inside the plugin code, the GOMP_unregister_var() 
>> destructor
>> tries to iterate through and acquire all device locks to cleanup. Since we 
>> already hold
>> one of the device locks, this just gets stuck.  Also because gomp_mutex_t is 
>> a
>> simple futex based lock implementation (instead of pthreads), we don't have a
>> trylock mechanism to use either.
>>
>> So this patch tries to alleviate this problem by changing the plugin 
>> interface;
>> the plugin routines that are called while holding the device lock are 
>> adjusted
>> to assume to never fatal exit, but return a value back to libgomp proper to
>> indicate execution results. The core libgomp code then may unlock and call 
>> gomp_fatal().
>>
>> We believe this is the right route to solve the problem, since there's only
>> two accel target plugins so far. Besides the nvptx plugin, I have made some 
>> effort
>> to update the intelmic plugin as well, though it's not as thoroughly audited.
>> Intel folks might want to further make sure your plugin code is free of this 
>> problem as well.
>>
>> This patch contains the libgomp proper changes. The nvptx and intelmic 
>> patches follow.
>> I have tested the libgomp testsuite without regressions for both accel 
>> targets, is this
>> okay for trunk?
> 
> (I have no objections)
> 
> However, in case of intelmic, these exit()s are just the tip of the iceberg,
> because underlying liboffloadmic contains other exit()s at fatal errors.
> And I don't know what to do with such deadlocks.
> 
>   -- Ilya

Yes, I think I saw more things to adjust wrt this issue within liboffloadmic, 
though I
hope this plugin interface change can set things ready.

And ping again, for the libgomp proper changes.

Thanks,
Chung-Lin



Reply via email to