Geir Magnusson Jr. wrote:
On Jan 10, 2007, at 9:00 AM, Gregory Shimansky wrote:
Geir Magnusson Jr. wrote:
I think the same problem may happen on Linux because it spills out
OOMEs on Ubuntu as well.
If somehow test doesn't crash on failed mallocs and gets to the
shutdown stage and hangs with 2 or more dead locked threads. So far
I didn't quite understand how they lock each other.
Cool - thanks. If you have a free second, could you note this on the
wiki page so we don't forget?
I think it is better to track this with JIRA. AFAIU is not a stress
conditions issue and so it is a normal bug which should be found and
fixed. I created a new JIRA HARMONY-2963 which is subtask for
HARMONY-2803 where Weldon attached his MegaSpawn test.
Agreed that a JIRA is important - I just wanted to make sure that we
added it somehow to the whiteboard so we had a complete picture of
things related to this problem.
Today investigation of the hanging threads at shutdown have 2 different
reasons. 1st one was found by Salikh and he wrote his comments in
HARMONY-2963. The bug happened because the counter of non-daemon threads
increased before a thread was created. If a thread failed to be created
because of no memory, this counter was not updated.
Another reason for hanging threads is that they wait in Thread.start().
When a new thread is started, it has to notify a lock object, in order
to signal the parent thread that it has been created. This notification
is sent from java code of the Thread before user code is executed.
But thread manager has some native code too which is ran before java
code of the newly started thread. This native code tried to set up some
thread state like new JNI environment and other stuff, and this requires
allocation of new memory. If allocation of new memory fails, this native
code of the newly created thread tries to return an error which is not
seen anywhere (since this is the code which is the first function of the
new thread), so it is not noticed. But since native code of the new
thread finishes silently, it never runs the Java code which should do
monitor notification, so monitor is not notified. So the parent thread
just waits infinitely.
To fix this bug I think it is necessary to get rid of error conditions
in the newly created threads. I think it is necessary to allocate all
necessary state before a new thread is started, so if these resources
cannot be allocated, an error should be returned to the parent thread,
and it won't wait infinitely on new thread start notification.
--
Gregory