I agree with Stefan. When I initially ran into the problem I came up with the following changeset that solves my problem, by calling clear_thread_current at the end of java_start.
http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/webrev/ <http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/webrev/> Brian > On May 6, 2016, at 8:41 AM, Stefan Karlsson <stefan.karls...@oracle.com> > wrote: > > On 06/05/16 16:32, David Holmes wrote: >> On 7/05/2016 12:04 AM, Stefan Karlsson wrote: >>> Hi David, >>> >>> On 06/05/16 15:38, David Holmes wrote: >>>> Hi Stefan, >>>> >>>> Thanks for taking a look at this. >>>> >>>> On 6/05/2016 5:02 PM, Stefan Karlsson wrote: >>>>> Hi David, >>>>> >>>>> I looked through the GC part of this webrev and I think the change is >>>>> fine. >>>>> >>>>> However, it seems a bit error prone. If we decide to change the code to, >>>>> for example, terminate the AbstractGangWorker threads, then we have to >>>>> remember to insert a ThreadLocalStorage::set_thread(NULL) call. >>>> >>>> That's why I added the ShouldNotReachHere()'s - if those threads start >>>> terminating then we will see those hit. Perhaps a comment: >>>> >>>> ShouldNotReachHere(); // If thread terminates we have to do TLS cleanup >>>> >>>> ? >>> >>> Yes, I would appreciate a comment. Though, when we add new threads, we >>> need to remember to add the set_thread(NULL) call. >> >> Well no, what you would do is manage your new threads in such a way that >> their run() method can do "delete this" as the last call. Only if you can't >> do that do you need to think about what termination logic is missing that >> needs to be done in lieu of the destructor. > > Yes, but this forces every implementer of a Thread:run() function to have to > think about these kind of requirements. > >> >>>> >>>>> Could we instead add a call to ThreadLocalStorage::set_thread(NULL), or >>>>> maybe even Thread::clear_thread_current(), in java_start? >>>>> >>>>> static void *java_start(Thread *thread) { >>>>> [...] >>>>> thread->initialize_thread_current(); >>>>> >>>>> [...] >>>>> >>>>> // call one more level start routine >>>>> thread->run(); >>>>> >>>>> ////////// Could we call Thread::clear_thread_current(); here? >>>> >>>> Not easily. For JavaThreads we've already done "delete this" inside >>>> the run() method, so we'd have to move that into java_start as well, >>>> but we can only do the delete for JavaThreads not for other threads. >>>> And we'd also have to change the VMThread and WatcherThread >>>> termination logic because of the deletes that happen in the >>>> termination thread - the "this" pointer (thread above) may no longer >>>> be valid when we want to call clear_current_thread() - which is why we >>>> can only do the ThreadLocalStorage::set_thread(NULL). >>>> >>>> I agree it would be a lot cleaner to have java_start do: >>>> >>>> thread->common_initialization(); >>>> thread->run(); >>>> thread->common_cleanup(); >>>> delete thread; >>>> >>>> for all threads, but we'd need a lot of other changes to allow for >>>> that. Otherwise we would need to note that kind of thread before >>>> calling run() then switch on the thread type after run() to decide >>>> what kind of cleanup is necessary and possible. I don't think that >>>> would be better than just doing the "right" cleanup at the end of the >>>> run() methods. >>> >>> I understand that this is a bit messy, and I won't insist that we change >>> this in this RFR, but without looking at this in much detail it sounds >>> weird to delete the thread in run(). Couldn't this be solved by >>> introducing a virtual Thread::post_run() function and do: >>> >>> virtual void Thread::post_run() { >>> clear_thread_current(); >>> } >>> >>> virtual void JavaThread::post_run() { >>> Thread::post_run(); >>> delete this; >>> } >> >> But again this can't work for the VMThread or WatcherThread as they are >> deleted from the termination thread and so thread->post_run() may SEGV.** >> Plus it is only after the fact that you realize not to put "delete this" in >> Thread::post_run(). > > OK, I didn't understand what you meant with "termination thread", but I now > see the call to VMThread::destroy(). > > With that said, I find it odd that VMThread::destroy() deletes the VM thread. > We already handshake between the VMThread and the "termination thread", so > why isn't that VMThread::post_run() implemented as: > > virtual void VMThread::post_run() { > // signal other threads that VM process is gone > { > // Note: we must have the _no_safepoint_check_flag. Mutex::lock() allows > // VM thread to enter any lock at Safepoint as long as its _owner is NULL. > // If that happens after _terminate_lock->wait() has unset _owner > // but before it actually drops the lock and waits, the notification below > // may get lost and we will have a hang. To avoid this, we need to use > // Mutex::lock_without_safepoint_check(). > MutexLockerEx ml(_terminate_lock, Mutex::_no_safepoint_check_flag); > _terminated = true; > _terminate_lock->notify(); > } > > Thread::post_run(); > delete this; > } > > And then we wouldn't get a SEGV ... > > I couldn't find the destructor for the WatchThread, but it seems easy to fix > that as well. > > I'm probably missing something, but I find it a bit annoying that code that > should belong to the *Thread:ing system leaks into the implementations of > *Thread::run(). > > Thanks, > StefanK >> >> ** Arguably the best solution to the "thread termination races with VM >> termination" problem is to not let the threads terminate. The code as it >> exists today can still have JavaThreads destroying themselves at the same >> that the VM is terminating and potentially hit the same errors that require >> us to not allow the VMThread (and now WatcherThread) to delete themselves. >> >> Thanks, >> David >> >>> Thanks, >>> StefanK >>> >>>> >>>> Thanks, >>>> David >>>> ------ >>>> >>>>> >>>>> log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ", pthread >>>>> id: " UINTX_FORMAT ").", >>>>> os::current_thread_id(), (uintx) pthread_self()); >>>>> >>>>> return 0; >>>>> } >>>>> >>>>> And get rid of the explicit calls to >>>>> ThreadLocalStorage::set_thread(NULL) you added? >>>>> >>>>> Thanks, >>>>> StefanK >>>>> >>>>> On 04/05/16 01:39, David Holmes wrote: >>>>>> This needs attention from GC and runtime folk please. >>>>>> >>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154715 >>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/ >>>>>> >>>>>> tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called >>>>>> before a thread terminates. >>>>>> >>>>>> Background: >>>>>> >>>>>> Most system-related threads do not expect to explicitly terminate, >>>>>> except sometimes as part of VM termination. Such threads don't have >>>>>> their destructors called, but should. >>>>>> >>>>>> This omission came to light due to the ThreadLocalStorage changes in >>>>>> JDK-8132510. As part of that change we deleted the following from the >>>>>> termination path of the VMThread: >>>>>> >>>>>> // Thread destructor usually does this. >>>>>> ThreadLocalStorage::set_thread(NULL); >>>>>> >>>>>> The clearing of TLS seemed irrelevant to the VMThread as it primarily >>>>>> is used to aid in JNI attach/detach. However Brian Gardner reported: >>>>>> >>>>>> http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.html >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> a problem on FreeBSD caused by this change and the interaction with >>>>>> the POSIX pthread TLS destructor use introduced by JDK-8033696. >>>>>> Because the VMThread terminated without clearing TLS, when the >>>>>> TLS-destructor was called it got into a loop which ran four times (as >>>>>> happens on Linux) and then prints a warning to the console (which >>>>>> doesn't happen on Linux). >>>>>> >>>>>> This indicates we need to restore the: >>>>>> >>>>>> ThreadLocalStorage::set_thread(NULL); >>>>>> >>>>>> but on further consideration it seems to me that this is not confined >>>>>> to the VMThread, and the most appropriate fix would be to always >>>>>> invoke the Thread destructor as a thread terminates. >>>>>> >>>>>> Solution: >>>>>> >>>>>> Further investigation shows that calling the Thread destructor in the >>>>>> thread as it terminates is not possible: >>>>>> >>>>>> - VMThread >>>>>> >>>>>> This is actually destroyed by the thread that terminates the VM, but >>>>>> that can happen after it terminates and so we still hit the TLS >>>>>> problem. The VMThread may be able to destroy itself today but in the >>>>>> past this was not possible (see existing code comment), and in the >>>>>> future it may also not be possible - the problem is that the Thread >>>>>> destructor can interact with other VM subsystems that are concurrently >>>>>> being torn down by the thread that is terminating the VM. In the past >>>>>> this was the CodeHeap. So rather than introduce something that is >>>>>> fragile we stick with the current scheme but restore the >>>>>> ThreadLocalStorage::set_thread(NULL); - note we can't access "this" at >>>>>> that time because it may already have been de-allocated. >>>>>> >>>>>> - WatcherThread >>>>>> >>>>>> The WatcherThread is never destroyed today but has the same problem as >>>>>> the VMThread. We can call the destructor from the VM termination >>>>>> thread (and have implemented that), but not from the WatcherThread >>>>>> itself. So again we just have to restore the >>>>>> ThreadLocalStorage::set_thread(NULL); to fix the potential TLS problem. >>>>>> >>>>>> - GC Threads >>>>>> >>>>>> There are two cases: >>>>>> >>>>>> a) GC threads that never terminate >>>>>> >>>>>> For these we don't need to do anything: we can't delete the thread as >>>>>> it never terminates and we don't hit the TLS problem because it never >>>>>> terminates. So all we will do here is add some logic to check (in >>>>>> NON_PRODUCT) that we do in fact never terminate. >>>>>> >>>>>> b) GC threads that can terminate >>>>>> >>>>>> Despite the fact the threads can terminate, references to those >>>>>> threads are stored elsewhere (WorkGangs and other places) and are not >>>>>> cleared as part of the termination process. Those references can be >>>>>> touched after the thread has terminated so we can not call the >>>>>> destructor at all. So again all we can do (without some major thread >>>>>> management reworking) is ensure that >>>>>> ThreadLocalStorage::set_thread(NULL); is called before the thread >>>>>> actually terminates >>>>>> >>>>>> Testing: JPRT >>>>>> RBT - runtime nightly tests >>>>>> >>>>>> Thanks, >>>>>> David