Thank you Stuart
I have fixed comment in JavaVM.java. Dealing with different cases in
ShutdownGracefully.java, two variables were added. One is a flag
indicate test passed or not. Other variable keeps the error message when
test failed. I put TestLibrary.bomb in the bottom of the main method
which only shows test fail message.
Could you review it again
http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.04/
Tristan
On 02/13/2014 05:29 AM, Stuart Marks wrote:
Hi Tristan,
JavaVM.waitFor looks mostly fine. The indentation of the start of the
waitFor(timeout) javadoc comment is a bit off, though; please fix.
There are still some adjustments that need to be made in
ShutdownGracefully.java. Both have to do with the case where the last
registration succeeds unexpectedly -- this is the one that we expect
to fail.
First, if this registration has succeeded unexpectedly, that means
rmid is still running. If that occurs, the call to
rmid.waitFor(timeout) will inevitably time out. It may be worth
calling rmid.destroy() directly at this point.
Second, still in the succeeded-unexpectedly case, at line 154
TestLibrary.bomb() is called. This throws an exception, but it's
caught by the catch-block at lines 157-158, which calls
TestLibrary.bomb() again, saying "unexpected exception". Except that
this is kind of expected, since it was thrown from an earlier call to
TestLibrary.bomb(). This is quite confusing.
There are several cases that need to be handled.
1. Normal case. Registration fails as expected, rmid has terminated
gracefully. Test passes.
2. Rmid is still running and has processed the registration request
successfully. Need to kill rmid and fail the test.
3. Rmid is still running but is in some bad state where the
registration request failed. Need to kill rmid and fail the test.
4. Some other unexpected failure. This is what the catch and finally
blocks at lines 157-161 are for.
These four cases need to be handled independently. Ideally they should
be separated from the cleanup code. As noted above, you don't want to
throw an exception from the try-block, because it will get caught by
your own catch block. Similarly, it's tempting to return from the
midst of the try-block in the success case, but this still runs the
finally-block. This can be quite confusing.
A typical technique for dealing with this kind of issue is to record
results of operations from within the try block, and then analyze the
results outside, throwing a test failure (TestLibrary.bomb) at that
point, where it won't be caught by the test's own catch block.
Editoral:
- line 153, there should be a space between 'if' and the opening paren
- line 156, typo, "gracefuuly"
Finally, it would be helpful if you could get webrev to generate the
actual changeset instead of the plain patch, per my other review email.
Thanks,
s'marks
On 2/11/14 9:39 PM, Tristan Yan wrote:
Thank you for your thorough mail. This is very educational. I took
you advice
and generated a new webrev for this.
http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.03/
I appreciate you can review this again.
Regards
Tristan
On Feb 11, 2014, at 8:32 AM, Stuart Marks <stuart.ma...@oracle.com
<mailto:stuart.ma...@oracle.com>> wrote:
Hi Tristan,
Sorry about my recurring delays.
Several comments on these changes.
JavaVM.java --
The waitFor(timeout) method is mostly ok. The new thread started at
line 208
and following seems unnecessary, though. This code is reached when a
timeout
occurs, so throwing TimeoutException is the only thing necessary in
this case.
Should the code to start the new thread be removed?
There should be a similar check for vm == null as in the waitFor()
[no args]
method.
ShutdownGracefully.java --
The condition that's checked after calling
rmid.waitFor(SHUTDOWN_TIMEOUT) is
incorrect. It's testing the exit status against zero. Offhand, when
and if
rmid exits, it might exit with a nonzero exit status. If rmid has
exited at
this point, then the test should succeed.
Instead of testing against zero, the code should catch
TimeoutException, which
means that rmid is still running. It's probably reasonable to catch
TimeoutException, print a message, and then let the finally-block
destroy the
rmid. Calling TestLibrary.bomb() from within the try-block is
confusing, since
that method throws an exception, which is then caught by the
catch-block, when
then calls TestLibrary.bomb() again.
We should also make sure to test the success case properly. If
rmid.waitFor()
returns in a timely fashion without throwing TimeoutException, it
doesn't
matter what the exit status is. (It might be worth printing it out.)
At that
point we know that rmid *has* exited gracefully, so we need to set
rmid to
null so that the finally-block doesn't attempt to destroy rmid
redundantly.
Some additional messages about rmid having exited and the test
passing are
also warranted for this case.
Some additional cleanup can be done here as well, over and above the
changes
you've proposed. (This stuff is left over from earlier RMI test
messes.) In
order to shut down an active object, the code here spawns a new
thread that
sleeps for a while and then deactivates this object. This isn't
necessary. (It
might have been necessary in the past.) It's sufficient simply to
unexport
this object and then deactivate it, directly within the shutdown()
method. See
test/java/rmi/activation/ActivationSystem/unregisterGroup/UnregisterGroup.java
for an example of this. In addition, the run() method can be
removed, and the
"implements Runnable" declaration can also be removed from the
ShutdownGracefully test class.
Finally, revisiting some code farther up in the test, the try-block
at lines
135-140 issues a registration request that the test expects to fail.
If it
succeeds, the message at line 139 isn't very clear; it should say
that the
registration request succeeded unexpectedly. This should cause the
test to
fail. We still probably want to go through the waitFor(timeout) path
and
eventual rmid cleanup, but a flag should be set here to ensure that
the test
indeed fails if the registration succeeds unexpectedly, and the
messages
should clearly indicate that this is going on.
A good way to test this last case is to change rmid's security
manager to the
normal security manager java.lang.SecurityManager instead of
TestSecurityManager.
Thanks,
s'marks
On 2/10/14 2:59 AM, Tristan Yan wrote:
Hi Stuart
Could you help to review this.
Thank you
Tristan
On Jan 31, 2014, at 4:36 PM, Tristan Yan <tristan....@oracle.com
<mailto:tristan....@oracle.com>
<mailto:tristan....@oracle.com>> wrote:
Thank you for fixing JDK-8023541. Then I leave
ActivationLibrary.java for now.
I still did some clean up following your suggestion.
1. I changed waitFor(long timeout) method, this method is going to
use code
like Process.waitFor(timeout, unit). This can be backported to
JDK7. Also
exitValue is kept as a return value. For making sure there is no
Pipe leak, a
cleanup thread will start when timeout happens.
2. Change in ShutdownGracefully is a little tricky. I think we
should just
destroy JVM once exception is thrown. So I move the wait logic
into try block
instead keep them in finally block.
Can you receive it again.
http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.02/
Thank you
Tristan
On 01/29/2014 03:16 PM, Stuart Marks wrote:
Hi Tristan,
I don't want to put the workaround into
ActivationLibrary.rmidRunning() for a
null return from the lookup, because this is only a workaround
for an actual
bug in rmid initialization. See the review I just posted for
JDK-8023541.
Adding JavaVM.waitFor(timeout) is something that would be useful
in general,
but it needs to be handled carefully. It uses the new
Process.waitFor(timeout, unit) which is new in Java SE 8; this makes
backporting to 7 more complicated. Not clear whether we'll do so,
but I don't
want to forclose the opportunity without discussion. It's also
not clear how
one can get the vm's exit status after JavaVM.waitFor() has
returned true.
With the Process API it's possible simply to call waitFor() or
exitValue().
With JavaVM, a new API needs to be created, or the rule has to be
established
that one must call JavaVM.waitFor() to collect the exit status as
well as to
close the pipes from the subprocess. If JavaVM.waitFor(timeout,
unit) is
called without subsequently calling JavaVM.waitFor(), the pipes
are leaked.
In ShutdownGracefully.java, the finally-block needs to check to
see if rmid
is still running, and if it is, to shut it down. Simply calling
waitFor(timeout, unit) isn't sufficient, because if the rmid
process is still
running, it will be left running.
The straightforward approach would be to call
ActivationLibrary.rmidRunning()
to test if it's still running. Unfortunately this isn't quite
right, because
rmidRunning() has a timeout loop in it -- which should probably
be removed.
(I think there's a bug for this.) Another approach would be
simply to call
rmid.destroy(). This calls rmid's shutdown() method first, which is
reasonable, but I'm not sure it kills the process if that fails.
In any case,
this already has a timeout loop waiting for the process to die, so
ShutdownGracefully.java needn't use a new waitFor(timeout, unit)
call.
Removing the commented-out code that starts with "no longer
needed" is good,
and removing the ShutdownDetectThread is also good, since that's
unnecessary.
There are some more cleanups I have in mind here but I'd like to
see a
revised webrev before proceeding.
Thanks,
s'marks
On 1/25/14 8:57 PM, Tristan Yan wrote:
Hi Stuart
Thank you for your review and suggestion.
Yes, since this failure mode is very hard to be reproduced. I
guess it's
make sense to do some hack. And I also noticed in
ActivationLibrary.rmidRunning. It does try to look up
ActivationSystem but
doesn't check if it's null. So I add the logic to make sure we
will look up
the non-null ActivationSystem. Also I did some cleanup if you
don't mind.
Add a waitFor(long timeout, TimeUnit unit) for JavaVM. Which we
can have a
better waitFor control.
I appreciate you can review the code again.
http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.01/
Thank you
Tristan
On 01/25/2014 10:20 AM, Stuart Marks wrote:
On 1/23/14 10:34 PM, Tristan Yan wrote:
Hi All
Could you review the bug fix for JDK-8032050.
http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.00/
Description:
This rare happened failure caused because when RMID starts. It
don't
guarantee
sun.rmi.server.Activation.startActivation finishes.
Fix by adding a iterative getSystem with a 5 seconds timeout.
Hi Tristan,
Adding a timing/retry loop into this test isn't the correct
approach for
fixing this test.
The timing loop isn't necessary because there is already a
timing loop in
RMID.start() in the RMI test library. (There's another timing
loop in
ActivationLibrary.rmidRunning() which should probably be
removed.) So the
intent of this library call is that it start rmid and wait for
it to become
ready. That logic doesn't need to be added to the test.
In the bug report JDK-8032050 you had mentioned that the
NullPointerException was suspicious. You're right! I took a
look and it
seemed like it was related to JDK-8023541, and I added a note
to this
effect to the bug report. The problem here is that rmid can
come up and
transiently return null instead of the stub of the activation
system.
That's what JDK-8023541 covers. I think that rmid itself needs
to be fixed,
though modifying the timing loop in the RMI test library to
wait for rmid
to come up *and* for the lookup to return non-null is an easy
way to fix
the problem. (Or at least cover it up.)
The next step in the analysis is to determine, given that
ActivationLibrary.getSystem can sometimes return null, whether
this has
actually caused this test failure. This is pretty easy to
determine; just
hack in a line "system = null" in the right place and run the
test. I've
done this, and the test times out and the output log is pretty
much
identical to the one in the bug report. (I recommend you try this
yourself.) So I think it's fairly safe to say that the problem in
JDK-8023541 has caused the failure listed in JDK-8032050.
I can see a couple ways to proceed here. One way is just to
close this out
as a duplicate of JDK-8023541 since that bug caused this failure.
Another is that this test could use some cleaning up. This bug
certainly
covers a failure, but the messages emitted are confusing and in
some cases
completely wrong. For example, the "rmid has shutdown" message
at line 180
is incorrect, because in this case rmid is still running and
the wait()
call has timed out. Most of the code here can be replaced with
calls to
various bits of the RMI test library. There are a bunch of
other things in
this test that could be cleaned up as well.
It's up to you how you'd like to proceed.
s'marks