Re: RFR: 8154715: Missing destructor and/or TLS clearing calls for terminating threads

Brian Gardner Fri, 06 May 2016 12:42:03 -0700

I agree with Stefan.  When I initially ran into the problem I came up with the 
following changeset that solves my problem, by calling clear_thread_current at 
the end of java_start.


http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/webrev/
 
<http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/webrev/>

Brian

> On May 6, 2016, at 8:41 AM, Stefan Karlsson <[email protected]> 
> wrote:
> 
> On 06/05/16 16:32, David Holmes wrote:
>> On 7/05/2016 12:04 AM, Stefan Karlsson wrote:
>>> Hi David,
>>> 
>>> On 06/05/16 15:38, David Holmes wrote:
>>>> Hi Stefan,
>>>> 
>>>> Thanks for taking a look at this.
>>>> 
>>>> On 6/05/2016 5:02 PM, Stefan Karlsson wrote:
>>>>> Hi David,
>>>>> 
>>>>> I looked through the GC part of this webrev and I think the change is
>>>>> fine.
>>>>> 
>>>>> However, it seems a bit error prone. If we decide to change the code to,
>>>>> for example, terminate the  AbstractGangWorker threads, then we have to
>>>>> remember to insert a ThreadLocalStorage::set_thread(NULL) call.
>>>> 
>>>> That's why I added the ShouldNotReachHere()'s - if those threads start
>>>> terminating then we will see those hit. Perhaps a comment:
>>>> 
>>>> ShouldNotReachHere(); // If thread terminates we have to do TLS cleanup
>>>> 
>>>> ?
>>> 
>>> Yes, I would appreciate a comment. Though, when we add new threads, we
>>> need to remember to add the set_thread(NULL) call.
>> 
>> Well no, what you would do is manage your new threads in such a way that 
>> their run() method can do "delete this" as the last call. Only if you can't 
>> do that do you need to think about what termination logic is missing that 
>> needs to be done in lieu of the destructor.
> 
> Yes, but this forces every implementer of a Thread:run() function to have to 
> think about these kind of requirements.
> 
>> 
>>>> 
>>>>> Could we instead add a call to ThreadLocalStorage::set_thread(NULL), or
>>>>> maybe even Thread::clear_thread_current(), in java_start?
>>>>> 
>>>>> static void *java_start(Thread *thread) {
>>>>> [...]
>>>>>  thread->initialize_thread_current();
>>>>> 
>>>>> [...]
>>>>> 
>>>>>  // call one more level start routine
>>>>>  thread->run();
>>>>> 
>>>>>  ////////// Could we call Thread::clear_thread_current(); here?
>>>> 
>>>> Not easily. For JavaThreads we've already done "delete this" inside
>>>> the run() method, so we'd have to move that into java_start as well,
>>>> but we can only do the delete for JavaThreads not for other threads.
>>>> And we'd also have to change the VMThread and WatcherThread
>>>> termination logic because of the deletes that happen in the
>>>> termination thread - the "this" pointer (thread above) may no longer
>>>> be valid when we want to call clear_current_thread() - which is why we
>>>> can only do the ThreadLocalStorage::set_thread(NULL).
>>>> 
>>>> I agree it would be a lot cleaner to have java_start do:
>>>> 
>>>>  thread->common_initialization();
>>>>  thread->run();
>>>>  thread->common_cleanup();
>>>>  delete thread;
>>>> 
>>>> for all threads, but we'd need a lot of other changes to allow for
>>>> that. Otherwise we would need to note that kind of thread before
>>>> calling run() then switch on the thread type after run() to decide
>>>> what kind of cleanup is necessary and possible. I don't think that
>>>> would be better than just doing the "right" cleanup at the end of the
>>>> run() methods.
>>> 
>>> I understand that this is a bit messy, and I won't insist that we change
>>> this in this RFR, but without looking at this in much detail it sounds
>>> weird to delete the thread in run(). Couldn't this be solved by
>>> introducing a virtual Thread::post_run() function and do:
>>> 
>>> virtual void Thread::post_run() {
>>>  clear_thread_current();
>>> }
>>> 
>>> virtual void JavaThread::post_run() {
>>>  Thread::post_run();
>>>  delete this;
>>> }
>> 
>> But again this can't work for the VMThread or WatcherThread as they are 
>> deleted from the termination thread and so thread->post_run() may SEGV.** 
>> Plus it is only after the fact that you realize not to put "delete this" in 
>> Thread::post_run().
> 
> OK, I didn't understand what you meant with "termination thread", but I now 
> see the call to VMThread::destroy().
> 
> With that said, I find it odd that VMThread::destroy() deletes the VM thread. 
> We already handshake between the VMThread and the "termination thread", so 
> why isn't that VMThread::post_run() implemented as:
> 
> virtual void VMThread::post_run() {
>  // signal other threads that VM process is gone
>  {
>    // Note: we must have the _no_safepoint_check_flag. Mutex::lock() allows
>    // VM thread to enter any lock at Safepoint as long as its _owner is NULL.
>    // If that happens after _terminate_lock->wait() has unset _owner
>    // but before it actually drops the lock and waits, the notification below
>    // may get lost and we will have a hang. To avoid this, we need to use
>    // Mutex::lock_without_safepoint_check().
>    MutexLockerEx ml(_terminate_lock, Mutex::_no_safepoint_check_flag);
>    _terminated = true;
>    _terminate_lock->notify();
>  }
> 
>  Thread::post_run();
>  delete this;
> }
> 
> And then we wouldn't get a SEGV ...
> 
> I couldn't find the destructor for the WatchThread, but it seems easy to fix 
> that as well.
> 
> I'm probably missing something, but I find it a bit annoying that code that 
> should belong to the *Thread:ing system leaks into the implementations of 
> *Thread::run().
> 
> Thanks,
> StefanK
>> 
>> ** Arguably the best solution to the "thread termination races with VM 
>> termination" problem is to not let the threads terminate. The code as it 
>> exists today can still have JavaThreads destroying themselves at the same 
>> that the VM is terminating and potentially hit the same errors that require 
>> us to not allow the VMThread (and now WatcherThread) to delete themselves.
>> 
>> Thanks,
>> David
>> 
>>> Thanks,
>>> StefanK
>>> 
>>>> 
>>>> Thanks,
>>>> David
>>>> ------
>>>> 
>>>>> 
>>>>>  log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ", pthread
>>>>> id: " UINTX_FORMAT ").",
>>>>>    os::current_thread_id(), (uintx) pthread_self());
>>>>> 
>>>>>  return 0;
>>>>> }
>>>>> 
>>>>> And get rid of the explicit calls to
>>>>> ThreadLocalStorage::set_thread(NULL) you added?
>>>>> 
>>>>> Thanks,
>>>>> StefanK
>>>>> 
>>>>> On 04/05/16 01:39, David Holmes wrote:
>>>>>> This needs attention from GC and runtime folk please.
>>>>>> 
>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154715
>>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/
>>>>>> 
>>>>>> tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called
>>>>>> before a thread terminates.
>>>>>> 
>>>>>> Background:
>>>>>> 
>>>>>> Most system-related threads do not expect to explicitly terminate,
>>>>>> except sometimes as part of VM termination. Such threads don't have
>>>>>> their destructors called, but should.
>>>>>> 
>>>>>> This omission came to light due to the ThreadLocalStorage changes in
>>>>>> JDK-8132510. As part of that change we deleted the following from the
>>>>>> termination path of the VMThread:
>>>>>> 
>>>>>> // Thread destructor usually does this.
>>>>>> ThreadLocalStorage::set_thread(NULL);
>>>>>> 
>>>>>> The clearing of TLS seemed irrelevant to the VMThread as it primarily
>>>>>> is used to aid in JNI attach/detach. However Brian Gardner reported:
>>>>>> 
>>>>>> http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.html
>>>>>>  
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> a problem on FreeBSD caused by this change and the interaction with
>>>>>> the POSIX  pthread TLS destructor use introduced by JDK-8033696.
>>>>>> Because the VMThread terminated without clearing TLS, when the
>>>>>> TLS-destructor was called it got into a loop which ran four times (as
>>>>>> happens on Linux) and then prints a warning to the console (which
>>>>>> doesn't happen on Linux).
>>>>>> 
>>>>>> This indicates we need to restore the:
>>>>>> 
>>>>>> ThreadLocalStorage::set_thread(NULL);
>>>>>> 
>>>>>> but on further consideration it seems to me that this is not confined
>>>>>> to the VMThread, and the most appropriate fix would be to always
>>>>>> invoke the Thread destructor as a thread terminates.
>>>>>> 
>>>>>> Solution:
>>>>>> 
>>>>>> Further investigation shows that calling the Thread destructor in the
>>>>>> thread as it terminates is not possible:
>>>>>> 
>>>>>> - VMThread
>>>>>> 
>>>>>> This is actually destroyed by the thread that terminates the VM, but
>>>>>> that can happen after it terminates and so we still hit the TLS
>>>>>> problem. The VMThread may be able to destroy itself today but in the
>>>>>> past this was not possible (see existing code comment), and in the
>>>>>> future it may also not be possible - the problem is that the Thread
>>>>>> destructor can interact with other VM subsystems that are concurrently
>>>>>> being torn down by the thread that is terminating the VM. In the past
>>>>>> this was the CodeHeap. So rather than introduce something that is
>>>>>> fragile we stick with the current scheme but restore the
>>>>>> ThreadLocalStorage::set_thread(NULL); - note we can't access "this" at
>>>>>> that time because it may already have been de-allocated.
>>>>>> 
>>>>>> - WatcherThread
>>>>>> 
>>>>>> The WatcherThread is never destroyed today but has the same problem as
>>>>>> the VMThread. We can call the destructor from the VM termination
>>>>>> thread (and have implemented that), but not from the WatcherThread
>>>>>> itself. So again we just have to restore the
>>>>>> ThreadLocalStorage::set_thread(NULL); to fix the potential TLS problem.
>>>>>> 
>>>>>> - GC Threads
>>>>>> 
>>>>>> There are two cases:
>>>>>> 
>>>>>> a) GC threads that never terminate
>>>>>> 
>>>>>> For these we don't need to do anything: we can't delete the thread as
>>>>>> it never terminates and we don't hit the TLS problem because it never
>>>>>> terminates. So all we will do here is add some logic to check (in
>>>>>> NON_PRODUCT) that we do in fact never terminate.
>>>>>> 
>>>>>> b) GC threads that can terminate
>>>>>> 
>>>>>> Despite the fact the threads can terminate, references to those
>>>>>> threads are stored elsewhere (WorkGangs and other places) and are not
>>>>>> cleared as part of the termination process. Those references can be
>>>>>> touched after the thread has terminated so we can not call the
>>>>>> destructor at all. So again all we can do (without some major thread
>>>>>> management reworking) is ensure that
>>>>>> ThreadLocalStorage::set_thread(NULL); is called before the thread
>>>>>> actually terminates
>>>>>> 
>>>>>> Testing: JPRT
>>>>>>         RBT - runtime nightly tests
>>>>>> 
>>>>>> Thanks,
>>>>>> David

Re: RFR: 8154715: Missing destructor and/or TLS clearing calls for terminating threads

Reply via email to