ShutdownGracefully.java fails intermittently

Tristan Yan Thu, 13 Feb 2014 00:28:04 -0800

Thank you Stuart

I have fixed comment in JavaVM.java. Dealing with different cases inShutdownGracefully.java, two variables were added. One is a flagindicate test passed or not. Other variable keeps the error message whentest failed. I put TestLibrary.bomb in the bottom of the main methodwhich only shows test fail message.

Could you review it again
http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.04/
Tristan


On 02/13/2014 05:29 AM, Stuart Marks wrote:

Hi Tristan,
JavaVM.waitFor looks mostly fine. The indentation of the start of thewaitFor(timeout) javadoc comment is a bit off, though; please fix.
There are still some adjustments that need to be made inShutdownGracefully.java. Both have to do with the case where the lastregistration succeeds unexpectedly -- this is the one that we expectto fail.
First, if this registration has succeeded unexpectedly, that meansrmid is still running. If that occurs, the call tormid.waitFor(timeout) will inevitably time out. It may be worthcalling rmid.destroy() directly at this point.
Second, still in the succeeded-unexpectedly case, at line 154TestLibrary.bomb() is called. This throws an exception, but it'scaught by the catch-block at lines 157-158, which callsTestLibrary.bomb() again, saying "unexpected exception". Except thatthis is kind of expected, since it was thrown from an earlier call toTestLibrary.bomb(). This is quite confusing.
There are several cases that need to be handled.
1. Normal case. Registration fails as expected, rmid has terminatedgracefully. Test passes.
2. Rmid is still running and has processed the registration requestsuccessfully. Need to kill rmid and fail the test.
3. Rmid is still running but is in some bad state where theregistration request failed. Need to kill rmid and fail the test.
4. Some other unexpected failure. This is what the catch and finallyblocks at lines 157-161 are for.
These four cases need to be handled independently. Ideally they shouldbe separated from the cleanup code. As noted above, you don't want tothrow an exception from the try-block, because it will get caught byyour own catch block. Similarly, it's tempting to return from themidst of the try-block in the success case, but this still runs thefinally-block. This can be quite confusing.
A typical technique for dealing with this kind of issue is to recordresults of operations from within the try block, and then analyze theresults outside, throwing a test failure (TestLibrary.bomb) at thatpoint, where it won't be caught by the test's own catch block.
Editoral:
 - line 153, there should be a space between 'if' and the opening paren
 - line 156, typo, "gracefuuly"
Finally, it would be helpful if you could get webrev to generate theactual changeset instead of the plain patch, per my other review email.
Thanks,

s'marks


On 2/11/14 9:39 PM, Tristan Yan wrote:
Thank you for your thorough mail. This is very educational. I tookyou advice
and generated a new webrev for this.

http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.03/
I appreciate you can review this again.
Regards
Tristan


On Feb 11, 2014, at 8:32 AM, Stuart Marks <stuart.ma...@oracle.com
<mailto:stuart.ma...@oracle.com>> wrote:
Hi Tristan,

Sorry about my recurring delays.

Several comments on these changes.

JavaVM.java --
The waitFor(timeout) method is mostly ok. The new thread started atline 208and following seems unnecessary, though. This code is reached when atimeoutoccurs, so throwing TimeoutException is the only thing necessary inthis case.
Should the code to start the new thread be removed?
There should be a similar check for vm == null as in the waitFor()[no args]
method.

ShutdownGracefully.java --
The condition that's checked after callingrmid.waitFor(SHUTDOWN_TIMEOUT) isincorrect. It's testing the exit status against zero. Offhand, whenand ifrmid exits, it might exit with a nonzero exit status. If rmid hasexited at
this point, then the test should succeed.
Instead of testing against zero, the code should catchTimeoutException, which
means that rmid is still running. It's probably reasonable to catch
TimeoutException, print a message, and then let the finally-blockdestroy thermid. Calling TestLibrary.bomb() from within the try-block isconfusing, sincethat method throws an exception, which is then caught by thecatch-block, when
then calls TestLibrary.bomb() again.
We should also make sure to test the success case properly. Ifrmid.waitFor()returns in a timely fashion without throwing TimeoutException, itdoesn'tmatter what the exit status is. (It might be worth printing it out.)At thatpoint we know that rmid *has* exited gracefully, so we need to setrmid tonull so that the finally-block doesn't attempt to destroy rmidredundantly.Some additional messages about rmid having exited and the testpassing are
also warranted for this case.
Some additional cleanup can be done here as well, over and above thechangesyou've proposed. (This stuff is left over from earlier RMI testmesses.) Inorder to shut down an active object, the code here spawns a newthread thatsleeps for a while and then deactivates this object. This isn'tnecessary. (Itmight have been necessary in the past.) It's sufficient simply tounexportthis object and then deactivate it, directly within the shutdown()method. See
test/java/rmi/activation/ActivationSystem/unregisterGroup/UnregisterGroup.java
for an example of this. In addition, the run() method can beremoved, and the
"implements Runnable" declaration can also be removed from the
ShutdownGracefully test class.
Finally, revisiting some code farther up in the test, the try-blockat lines135-140 issues a registration request that the test expects to fail.If itsucceeds, the message at line 139 isn't very clear; it should saythat theregistration request succeeded unexpectedly. This should cause thetest tofail. We still probably want to go through the waitFor(timeout) pathandeventual rmid cleanup, but a flag should be set here to ensure thatthe testindeed fails if the registration succeeds unexpectedly, and themessages
should clearly indicate that this is going on.
A good way to test this last case is to change rmid's securitymanager to thenormal security manager java.lang.SecurityManager instead ofTestSecurityManager.
Thanks,

s'marks




On 2/10/14 2:59 AM, Tristan Yan wrote:
Hi Stuart
Could you help to review this.
Thank you
Tristan

On Jan 31, 2014, at 4:36 PM, Tristan Yan <tristan....@oracle.com
<mailto:tristan....@oracle.com>
<mailto:tristan....@oracle.com>> wrote:
Thank you for fixing JDK-8023541. Then I leaveActivationLibrary.java for now.
I still did some clean up following your suggestion.
1. I changed waitFor(long timeout) method, this method is going touse codelike Process.waitFor(timeout, unit). This can be backported toJDK7. AlsoexitValue is kept as a return value. For making sure there is noPipe leak, a
cleanup thread will start when timeout happens.
2. Change in ShutdownGracefully is a little tricky. I think weshould justdestroy JVM once exception is thrown. So I move the wait logicinto try block
instead keep them in finally block.
Can you receive it again.
http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.02/
Thank you
Tristan

On 01/29/2014 03:16 PM, Stuart Marks wrote:
Hi Tristan,
I don't want to put the workaround intoActivationLibrary.rmidRunning() for anull return from the lookup, because this is only a workaroundfor an actualbug in rmid initialization. See the review I just posted forJDK-8023541.
Adding JavaVM.waitFor(timeout) is something that would be usefulin general,
but it needs to be handled carefully. It uses the new
Process.waitFor(timeout, unit) which is new in Java SE 8; this makes
backporting to 7 more complicated. Not clear whether we'll do so,but I don'twant to forclose the opportunity without discussion. It's alsonot clear howone can get the vm's exit status after JavaVM.waitFor() hasreturned true.With the Process API it's possible simply to call waitFor() orexitValue().With JavaVM, a new API needs to be created, or the rule has to beestablishedthat one must call JavaVM.waitFor() to collect the exit status aswell as toclose the pipes from the subprocess. If JavaVM.waitFor(timeout,unit) iscalled without subsequently calling JavaVM.waitFor(), the pipesare leaked.
In ShutdownGracefully.java, the finally-block needs to check tosee if rmid
is still running, and if it is, to shut it down. Simply calling
waitFor(timeout, unit) isn't sufficient, because if the rmidprocess is still
running, it will be left running.
The straightforward approach would be to callActivationLibrary.rmidRunning()to test if it's still running. Unfortunately this isn't quiteright, becausermidRunning() has a timeout loop in it -- which should probablybe removed.(I think there's a bug for this.) Another approach would besimply to call
rmid.destroy(). This calls rmid's shutdown() method first, which is
reasonable, but I'm not sure it kills the process if that fails.In any case,
this already has a timeout loop waiting for the process to die, so
ShutdownGracefully.java needn't use a new waitFor(timeout, unit)call.
Removing the commented-out code that starts with "no longerneeded" is good,and removing the ShutdownDetectThread is also good, since that'sunnecessary.
There are some more cleanups I have in mind here but I'd like tosee a
revised webrev before proceeding.

Thanks,

s'marks

On 1/25/14 8:57 PM, Tristan Yan wrote:
Hi Stuart
Thank you for your review and suggestion.
Yes, since this failure mode is very hard to be reproduced. Iguess it's
make sense  to do some hack. And I also noticed in
ActivationLibrary.rmidRunning. It does try to look upActivationSystem butdoesn't check if it's null. So I add the logic to make sure wewill look upthe non-null ActivationSystem. Also I did some cleanup if youdon't mind.Add a waitFor(long timeout, TimeUnit unit) for JavaVM. Which wecan have a
better waitFor control.
I appreciate you can review the code again.
http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.01/
Thank you
Tristan


On 01/25/2014 10:20 AM, Stuart Marks wrote:
On 1/23/14 10:34 PM, Tristan Yan wrote:
Hi All
Could you review the bug fix for JDK-8032050.

http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.00/

Description:
This rare happened failure caused because when RMID starts. Itdon't
guarantee
sun.rmi.server.Activation.startActivation finishes.
Fix by adding a iterative getSystem with a 5 seconds timeout.
Hi Tristan,
Adding a timing/retry loop into this test isn't the correctapproach for
fixing this test.
The timing loop isn't necessary because there is already atiming loop inRMID.start() in the RMI test library. (There's another timingloop inActivationLibrary.rmidRunning() which should probably beremoved.) So theintent of this library call is that it start rmid and wait forit to become
ready. That logic doesn't need to be added to the test.

In the bug report JDK-8032050 you had mentioned that the
NullPointerException was suspicious. You're right! I took alook and itseemed like it was related to JDK-8023541, and I added a noteto thiseffect to the bug report. The problem here is that rmid cancome up andtransiently return null instead of the stub of the activationsystem.That's what JDK-8023541 covers. I think that rmid itself needsto be fixed,though modifying the timing loop in the RMI test library towait for rmidto come up *and* for the lookup to return non-null is an easyway to fix
the problem. (Or at least cover it up.)

The next step in the analysis is to determine, given that
ActivationLibrary.getSystem can sometimes return null, whetherthis hasactually caused this test failure. This is pretty easy todetermine; justhack in a line "system = null" in the right place and run thetest. I'vedone this, and the test times out and the output log is prettymuch
identical to the one in the bug report. (I recommend you try this
yourself.) So I think it's fairly safe to say that the problem in
JDK-8023541 has caused the failure listed in JDK-8032050.
I can see a couple ways to proceed here. One way is just toclose this out
as a duplicate of JDK-8023541 since that bug caused this failure.
Another is that this test could use some cleaning up. This bugcertainlycovers a failure, but the messages emitted are confusing and insome casescompletely wrong. For example, the "rmid has shutdown" messageat line 180is incorrect, because in this case rmid is still running andthe wait()call has timed out. Most of the code here can be replaced withcalls tovarious bits of the RMI test library. There are a bunch ofother things in
this test that could be cleaned up as well.

It's up to you how you'd like to proceed.

s'marks

Re: RFR: JDK-8032050: TEST_BUG: java/rmi/activation/Activatable/shutdownGracefully/ShutdownGracefully.java fails intermittently

Reply via email to