Geir Magnusson Jr. wrote:

On Jan 10, 2007, at 9:00 AM, Gregory Shimansky wrote:

Geir Magnusson Jr. wrote:
I think the same problem may happen on Linux because it spills out OOMEs on Ubuntu as well.

If somehow test doesn't crash on failed mallocs and gets to the shutdown stage and hangs with 2 or more dead locked threads. So far I didn't quite understand how they lock each other.
Cool - thanks. If you have a free second, could you note this on the wiki page so we don't forget?

I think it is better to track this with JIRA. AFAIU is not a stress conditions issue and so it is a normal bug which should be found and fixed. I created a new JIRA HARMONY-2963 which is subtask for HARMONY-2803 where Weldon attached his MegaSpawn test.

Agreed that a JIRA is important - I just wanted to make sure that we added it somehow to the whiteboard so we had a complete picture of things related to this problem.

Today investigation of the hanging threads at shutdown have 2 different reasons. 1st one was found by Salikh and he wrote his comments in HARMONY-2963. The bug happened because the counter of non-daemon threads increased before a thread was created. If a thread failed to be created because of no memory, this counter was not updated.

Another reason for hanging threads is that they wait in Thread.start(). When a new thread is started, it has to notify a lock object, in order to signal the parent thread that it has been created. This notification is sent from java code of the Thread before user code is executed.

But thread manager has some native code too which is ran before java code of the newly started thread. This native code tried to set up some thread state like new JNI environment and other stuff, and this requires allocation of new memory. If allocation of new memory fails, this native code of the newly created thread tries to return an error which is not seen anywhere (since this is the code which is the first function of the new thread), so it is not noticed. But since native code of the new thread finishes silently, it never runs the Java code which should do monitor notification, so monitor is not notified. So the parent thread just waits infinitely.

To fix this bug I think it is necessary to get rid of error conditions in the newly created threads. I think it is necessary to allocate all necessary state before a new thread is started, so if these resources cannot be allocated, an error should be returned to the parent thread, and it won't wait infinitely on new thread start notification.

--
Gregory

Reply via email to