Thank you for fixing JDK-8023541. Then I leave ActivationLibrary.java
for now.
I still did some clean up following your suggestion.
1. I changed waitFor(long timeout) method, this method is going to use
code like Process.waitFor(timeout, unit). This can be backported to
JDK7. Also exitValue is kept as a return value. For making sure there is
no Pipe leak, a cleanup thread will start when timeout happens.
2. Change in ShutdownGracefully is a little tricky. I think we should
just destroy JVM once exception is thrown. So I move the wait logic into
try block instead keep them in finally block.
Can you receive it again.
http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.02/
Thank you
Tristan
On 01/29/2014 03:16 PM, Stuart Marks wrote:
Hi Tristan,
I don't want to put the workaround into
ActivationLibrary.rmidRunning() for a null return from the lookup,
because this is only a workaround for an actual bug in rmid
initialization. See the review I just posted for JDK-8023541.
Adding JavaVM.waitFor(timeout) is something that would be useful in
general, but it needs to be handled carefully. It uses the new
Process.waitFor(timeout, unit) which is new in Java SE 8; this makes
backporting to 7 more complicated. Not clear whether we'll do so, but
I don't want to forclose the opportunity without discussion. It's also
not clear how one can get the vm's exit status after JavaVM.waitFor()
has returned true. With the Process API it's possible simply to call
waitFor() or exitValue(). With JavaVM, a new API needs to be created,
or the rule has to be established that one must call JavaVM.waitFor()
to collect the exit status as well as to close the pipes from the
subprocess. If JavaVM.waitFor(timeout, unit) is called without
subsequently calling JavaVM.waitFor(), the pipes are leaked.
In ShutdownGracefully.java, the finally-block needs to check to see if
rmid is still running, and if it is, to shut it down. Simply calling
waitFor(timeout, unit) isn't sufficient, because if the rmid process
is still running, it will be left running.
The straightforward approach would be to call
ActivationLibrary.rmidRunning() to test if it's still running.
Unfortunately this isn't quite right, because rmidRunning() has a
timeout loop in it -- which should probably be removed. (I think
there's a bug for this.) Another approach would be simply to call
rmid.destroy(). This calls rmid's shutdown() method first, which is
reasonable, but I'm not sure it kills the process if that fails. In
any case, this already has a timeout loop waiting for the process to
die, so ShutdownGracefully.java needn't use a new waitFor(timeout,
unit) call.
Removing the commented-out code that starts with "no longer needed" is
good, and removing the ShutdownDetectThread is also good, since that's
unnecessary.
There are some more cleanups I have in mind here but I'd like to see a
revised webrev before proceeding.
Thanks,
s'marks
On 1/25/14 8:57 PM, Tristan Yan wrote:
Hi Stuart
Thank you for your review and suggestion.
Yes, since this failure mode is very hard to be reproduced. I guess
it's make sense to do some hack. And I also noticed in
ActivationLibrary.rmidRunning. It does try to look up
ActivationSystem but doesn't check if it's null. So I add the logic
to make sure we will look up the non-null ActivationSystem. Also I
did some cleanup if you don't mind. Add a waitFor(long timeout,
TimeUnit unit) for JavaVM. Which we can have a better waitFor control.
I appreciate you can review the code again.
http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.01/
Thank you
Tristan
On 01/25/2014 10:20 AM, Stuart Marks wrote:
On 1/23/14 10:34 PM, Tristan Yan wrote:
Hi All
Could you review the bug fix for JDK-8032050.
http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.00/
Description:
This rare happened failure caused because when RMID starts. It
don't guarantee
sun.rmi.server.Activation.startActivation finishes.
Fix by adding a iterative getSystem with a 5 seconds timeout.
Hi Tristan,
Adding a timing/retry loop into this test isn't the correct approach
for fixing this test.
The timing loop isn't necessary because there is already a timing
loop in RMID.start() in the RMI test library. (There's another
timing loop in ActivationLibrary.rmidRunning() which should probably
be removed.) So the intent of this library call is that it start
rmid and wait for it to become ready. That logic doesn't need to be
added to the test.
In the bug report JDK-8032050 you had mentioned that the
NullPointerException was suspicious. You're right! I took a look and
it seemed like it was related to JDK-8023541, and I added a note to
this effect to the bug report. The problem here is that rmid can
come up and transiently return null instead of the stub of the
activation system. That's what JDK-8023541 covers. I think that rmid
itself needs to be fixed, though modifying the timing loop in the
RMI test library to wait for rmid to come up *and* for the lookup to
return non-null is an easy way to fix the problem. (Or at least
cover it up.)
The next step in the analysis is to determine, given that
ActivationLibrary.getSystem can sometimes return null, whether this
has actually caused this test failure. This is pretty easy to
determine; just hack in a line "system = null" in the right place
and run the test. I've done this, and the test times out and the
output log is pretty much identical to the one in the bug report. (I
recommend you try this yourself.) So I think it's fairly safe to say
that the problem in JDK-8023541 has caused the failure listed in
JDK-8032050.
I can see a couple ways to proceed here. One way is just to close
this out as a duplicate of JDK-8023541 since that bug caused this
failure.
Another is that this test could use some cleaning up. This bug
certainly covers a failure, but the messages emitted are confusing
and in some cases completely wrong. For example, the "rmid has
shutdown" message at line 180 is incorrect, because in this case
rmid is still running and the wait() call has timed out. Most of the
code here can be replaced with calls to various bits of the RMI test
library. There are a bunch of other things in this test that could
be cleaned up as well.
It's up to you how you'd like to proceed.
s'marks