Re: RFR: 8154715: Missing destructor and/or TLS clearing calls for terminating threads

Stefan Karlsson Mon, 09 May 2016 02:32:55 -0700

Hi,

For java_start, would it be enough to just, unconditionally, callThreadLocalStorage::set_thread(NULL) after thread->run() ?


StefanK

On 2016-05-07 07:45, David Holmes wrote:

Brian, Stefan,

I've been giving this more thought and I think you are both right.While this can't be completely handled in java_start for all threadsit can handle the most common cases. If I don't add the deletion ofthe WatcherThread then the problematic case reduces to that of theVMThread.

I'll do a new version on Monday. I also need to double-check thosethreads that don't use java_start (which I am sorely tempted to rename).


Thanks,
David

On 7/05/2016 9:50 AM, David Holmes wrote:

On 7/05/2016 5:40 AM, Brian Gardner wrote:

I agree with Stefan.  When I initially ran into the problem I came up
with the following changeset that solves my problem, by calling
clear_thread_current at the end of java_start.

http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/webrev/


As I said I can see the appeal in doing this, but there is still a race
for the threads destroyed at VM shutdown as current_or_null() can return
non-NULL but the delete can happen before we call
thread->clear_current_thread().

You would also want Thread::current_or_null_safe() I think, due to
already deleted JavaThreads.

And this all side-steps the real issue in my opinion that with clean
thread lifecycle management we should always be able to delete the
terminating thread.

So to me it is a question of selecting which set of bandaids you want to
apply.

Thanks,
David

Brian

On May 6, 2016, at 8:41 AM, Stefan Karlsson

<[email protected] <mailto:[email protected]>>wrote:


On 06/05/16 16:32, David Holmes wrote:

On 7/05/2016 12:04 AM, Stefan Karlsson wrote:

Hi David,

On 06/05/16 15:38, David Holmes wrote:

Hi Stefan,

Thanks for taking a look at this.

On 6/05/2016 5:02 PM, Stefan Karlsson wrote:

Hi David,

I looked through the GC part of this webrev and I think the
change is
fine.

However, it seems a bit error prone. If we decide to change the
code to,
for example, terminate the  AbstractGangWorker threads, then we
have to
remember to insert a ThreadLocalStorage::set_thread(NULL) call.


That's why I added the ShouldNotReachHere()'s - if those threads
start
terminating then we will see those hit. Perhaps a comment:

ShouldNotReachHere(); // If thread terminates we have to do TLS
cleanup

?

Yes, I would appreciate a comment. Though, when we add newthreads, we

need to remember to add the set_thread(NULL) call.


Well no, what you would do is manage your new threads in such a way
that their run() method can do "delete this" as the last call. Only
if you can't do that do you need to think about what termination
logic is missing that needs to be done in lieu of the destructor.


Yes, but this forces every implementer of a Thread:run() function to
have to think about these kind of requirements.

Could we instead add a call to
ThreadLocalStorage::set_thread(NULL), or
maybe even Thread::clear_thread_current(), in java_start?

static void *java_start(Thread *thread) {
[...]
 thread->initialize_thread_current();

[...]

 // call one more level start routine
 thread->run();

 ////////// Could we call Thread::clear_thread_current(); here?


Not easily. For JavaThreads we've already done "delete this" inside

the run() method, so we'd have to move that into java_start aswell,but we can only do the delete for JavaThreads not for otherthreads.

And we'd also have to change the VMThread and WatcherThread
termination logic because of the deletes that happen in the

termination thread - the "this" pointer (thread above) may nolonger

be valid when we want to call clear_current_thread() - which is
why we
can only do the ThreadLocalStorage::set_thread(NULL).

I agree it would be a lot cleaner to have java_start do:

 thread->common_initialization();
 thread->run();
 thread->common_cleanup();
 delete thread;

for all threads, but we'd need a lot of other changes to allow for
that. Otherwise we would need to note that kind of thread before
calling run() then switch on the thread type after run() to decide
what kind of cleanup is necessary and possible. I don't think that

would be better than just doing the "right" cleanup at the endof the

run() methods.


I understand that this is a bit messy, and I won't insist that we
change

this in this RFR, but without looking at this in much detail itsounds

weird to delete the thread in run(). Couldn't this be solved by
introducing a virtual Thread::post_run() function and do:

virtual void Thread::post_run() {
 clear_thread_current();
}

virtual void JavaThread::post_run() {
 Thread::post_run();
 delete this;
}


But again this can't work for the VMThread or WatcherThread as they
are deleted from the termination thread and so thread->post_run() may
SEGV.** Plus it is only after the fact that you realize not to put
"delete this" in Thread::post_run().


OK, I didn't understand what you meant with "termination thread", but
I now see the call to VMThread::destroy().

With that said, I find it odd that VMThread::destroy() deletes the VM
thread. We already handshake between the VMThread and the "termination
thread", so why isn't that VMThread::post_run() implemented as:

virtual void VMThread::post_run() {
 // signal other threads that VM process is gone
 {
   // Note: we must have the _no_safepoint_check_flag. Mutex::lock()
allows
   // VM thread to enter any lock at Safepoint as long as its _owner
is NULL.
   // If that happens after _terminate_lock->wait() has unset _owner
   // but before it actually drops the lock and waits, the
notification below
   // may get lost and we will have a hang. To avoid this, we need to
use
   // Mutex::lock_without_safepoint_check().
   MutexLockerEx ml(_terminate_lock, Mutex::_no_safepoint_check_flag);
   _terminated = true;
   _terminate_lock->notify();
 }

 Thread::post_run();
 delete this;
}

And then we wouldn't get a SEGV ...

I couldn't find the destructor for the WatchThread, but it seems easy
to fix that as well.

I'm probably missing something, but I find it a bit annoying that code
that should belong to the *Thread:ing system leaks into the
implementations of *Thread::run().

Thanks,
StefanK


** Arguably the best solution to the "thread termination races with
VM termination" problem is to not let the threads terminate. The code
as it exists today can still have JavaThreads destroying themselves
at the same that the VM is terminating and potentially hit the same
errors that require us to not allow the VMThread (and now
WatcherThread) to delete themselves.

Thanks,
David

Thanks,
StefanK


Thanks,
David
------


 log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ",
pthread
id: " UINTX_FORMAT ").",
   os::current_thread_id(), (uintx) pthread_self());

 return 0;
}

And get rid of the explicit calls to
ThreadLocalStorage::set_thread(NULL) you added?

Thanks,
StefanK

On 04/05/16 01:39, David Holmes wrote:

This needs attention from GC and runtime folk please.

bug: https://bugs.openjdk.java.net/browse/JDK-8154715
webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/

tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is alwayscalled

before a thread terminates.

Background:

Most system-related threads do not expect to explicitlyterminate,except sometimes as part of VM termination. Such threads don'thave

their destructors called, but should.

This omission came to light due to the ThreadLocalStorage
changes in
JDK-8132510. As part of that change we deleted the following
from the
termination path of the VMThread:

// Thread destructor usually does this.
ThreadLocalStorage::set_thread(NULL);

The clearing of TLS seemed irrelevant to the VMThread as it
primarily
is used to aid in JNI attach/detach. However Brian Gardner
reported:

http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.html

a problem on FreeBSD caused by this change and the interactionwith

the POSIX  pthread TLS destructor use introduced by JDK-8033696.
Because the VMThread terminated without clearing TLS, when the
TLS-destructor was called it got into a loop which ran four
times (as
happens on Linux) and then prints a warning to the console (which
doesn't happen on Linux).

This indicates we need to restore the:

ThreadLocalStorage::set_thread(NULL);

but on further consideration it seems to me that this is not
confined
to the VMThread, and the most appropriate fix would be to always
invoke the Thread destructor as a thread terminates.

Solution:

Further investigation shows that calling the Thread destructor
in the
thread as it terminates is not possible:

- VMThread

This is actually destroyed by the thread that terminates the VM,
but
that can happen after it terminates and so we still hit the TLS
problem. The VMThread may be able to destroy itself today but in
the

past this was not possible (see existing code comment), and inthefuture it may also not be possible - the problem is that theThread

destructor can interact with other VM subsystems that are
concurrently
being torn down by the thread that is terminating the VM. In the
past
this was the CodeHeap. So rather than introduce something that is
fragile we stick with the current scheme but restore the
ThreadLocalStorage::set_thread(NULL); - note we can't access
"this" at
that time because it may already have been de-allocated.

- WatcherThread

The WatcherThread is never destroyed today but has the same
problem as
the VMThread. We can call the destructor from the VM termination

thread (and have implemented that), but not from theWatcherThread

itself. So again we just have to restore the
ThreadLocalStorage::set_thread(NULL); to fix the potential TLS
problem.

- GC Threads

There are two cases:

a) GC threads that never terminate

For these we don't need to do anything: we can't delete the
thread as
it never terminates and we don't hit the TLS problem because it
never
terminates. So all we will do here is add some logic to check (in
NON_PRODUCT) that we do in fact never terminate.

b) GC threads that can terminate

Despite the fact the threads can terminate, references to those
threads are stored elsewhere (WorkGangs and other places) and
are not

cleared as part of the termination process. Those referencescan be

touched after the thread has terminated so we can not call the
destructor at all. So again all we can do (without some major
thread
management reworking) is ensure that
ThreadLocalStorage::set_thread(NULL); is called before the thread
actually terminates

Testing: JPRT
        RBT - runtime nightly tests

Thanks,
David

Re: RFR: 8154715: Missing destructor and/or TLS clearing calls for terminating threads

Reply via email to