Thank you for your thorough mail. This is very educational. I took you advice and generated a new webrev for this.
http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.03/ I appreciate you can review this again. Regards Tristan On Feb 11, 2014, at 8:32 AM, Stuart Marks <stuart.ma...@oracle.com> wrote: > Hi Tristan, > > Sorry about my recurring delays. > > Several comments on these changes. > > JavaVM.java -- > > The waitFor(timeout) method is mostly ok. The new thread started at line 208 > and following seems unnecessary, though. This code is reached when a timeout > occurs, so throwing TimeoutException is the only thing necessary in this > case. Should the code to start the new thread be removed? > > There should be a similar check for vm == null as in the waitFor() [no args] > method. > > ShutdownGracefully.java -- > > The condition that's checked after calling rmid.waitFor(SHUTDOWN_TIMEOUT) is > incorrect. It's testing the exit status against zero. Offhand, when and if > rmid exits, it might exit with a nonzero exit status. If rmid has exited at > this point, then the test should succeed. > > Instead of testing against zero, the code should catch TimeoutException, > which means that rmid is still running. It's probably reasonable to catch > TimeoutException, print a message, and then let the finally-block destroy the > rmid. Calling TestLibrary.bomb() from within the try-block is confusing, > since that method throws an exception, which is then caught by the > catch-block, when then calls TestLibrary.bomb() again. > > We should also make sure to test the success case properly. If rmid.waitFor() > returns in a timely fashion without throwing TimeoutException, it doesn't > matter what the exit status is. (It might be worth printing it out.) At that > point we know that rmid *has* exited gracefully, so we need to set rmid to > null so that the finally-block doesn't attempt to destroy rmid redundantly. > Some additional messages about rmid having exited and the test passing are > also warranted for this case. > > Some additional cleanup can be done here as well, over and above the changes > you've proposed. (This stuff is left over from earlier RMI test messes.) In > order to shut down an active object, the code here spawns a new thread that > sleeps for a while and then deactivates this object. This isn't necessary. > (It might have been necessary in the past.) It's sufficient simply to > unexport this object and then deactivate it, directly within the shutdown() > method. See > > test/java/rmi/activation/ActivationSystem/unregisterGroup/UnregisterGroup.java > > for an example of this. In addition, the run() method can be removed, and the > "implements Runnable" declaration can also be removed from the > ShutdownGracefully test class. > > Finally, revisiting some code farther up in the test, the try-block at lines > 135-140 issues a registration request that the test expects to fail. If it > succeeds, the message at line 139 isn't very clear; it should say that the > registration request succeeded unexpectedly. This should cause the test to > fail. We still probably want to go through the waitFor(timeout) path and > eventual rmid cleanup, but a flag should be set here to ensure that the test > indeed fails if the registration succeeds unexpectedly, and the messages > should clearly indicate that this is going on. > > A good way to test this last case is to change rmid's security manager to the > normal security manager java.lang.SecurityManager instead of > TestSecurityManager. > > Thanks, > > s'marks > > > > > On 2/10/14 2:59 AM, Tristan Yan wrote: >> Hi Stuart >> Could you help to review this. >> Thank you >> Tristan >> >> On Jan 31, 2014, at 4:36 PM, Tristan Yan <tristan....@oracle.com >> <mailto:tristan....@oracle.com>> wrote: >> >>> Thank you for fixing JDK-8023541. Then I leave ActivationLibrary.java for >>> now. >>> I still did some clean up following your suggestion. >>> 1. I changed waitFor(long timeout) method, this method is going to use code >>> like Process.waitFor(timeout, unit). This can be backported to JDK7. Also >>> exitValue is kept as a return value. For making sure there is no Pipe leak, >>> a >>> cleanup thread will start when timeout happens. >>> 2. Change in ShutdownGracefully is a little tricky. I think we should just >>> destroy JVM once exception is thrown. So I move the wait logic into try >>> block >>> instead keep them in finally block. >>> Can you receive it again. >>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.02/ >>> Thank you >>> Tristan >>> >>> On 01/29/2014 03:16 PM, Stuart Marks wrote: >>>> Hi Tristan, >>>> >>>> I don't want to put the workaround into ActivationLibrary.rmidRunning() >>>> for a >>>> null return from the lookup, because this is only a workaround for an >>>> actual >>>> bug in rmid initialization. See the review I just posted for JDK-8023541. >>>> >>>> Adding JavaVM.waitFor(timeout) is something that would be useful in >>>> general, >>>> but it needs to be handled carefully. It uses the new >>>> Process.waitFor(timeout, unit) which is new in Java SE 8; this makes >>>> backporting to 7 more complicated. Not clear whether we'll do so, but I >>>> don't >>>> want to forclose the opportunity without discussion. It's also not clear >>>> how >>>> one can get the vm's exit status after JavaVM.waitFor() has returned true. >>>> With the Process API it's possible simply to call waitFor() or exitValue(). >>>> With JavaVM, a new API needs to be created, or the rule has to be >>>> established >>>> that one must call JavaVM.waitFor() to collect the exit status as well as >>>> to >>>> close the pipes from the subprocess. If JavaVM.waitFor(timeout, unit) is >>>> called without subsequently calling JavaVM.waitFor(), the pipes are leaked. >>>> >>>> In ShutdownGracefully.java, the finally-block needs to check to see if rmid >>>> is still running, and if it is, to shut it down. Simply calling >>>> waitFor(timeout, unit) isn't sufficient, because if the rmid process is >>>> still >>>> running, it will be left running. >>>> >>>> The straightforward approach would be to call >>>> ActivationLibrary.rmidRunning() >>>> to test if it's still running. Unfortunately this isn't quite right, >>>> because >>>> rmidRunning() has a timeout loop in it -- which should probably be removed. >>>> (I think there's a bug for this.) Another approach would be simply to call >>>> rmid.destroy(). This calls rmid's shutdown() method first, which is >>>> reasonable, but I'm not sure it kills the process if that fails. In any >>>> case, >>>> this already has a timeout loop waiting for the process to die, so >>>> ShutdownGracefully.java needn't use a new waitFor(timeout, unit) call. >>>> >>>> Removing the commented-out code that starts with "no longer needed" is >>>> good, >>>> and removing the ShutdownDetectThread is also good, since that's >>>> unnecessary. >>>> >>>> There are some more cleanups I have in mind here but I'd like to see a >>>> revised webrev before proceeding. >>>> >>>> Thanks, >>>> >>>> s'marks >>>> >>>> On 1/25/14 8:57 PM, Tristan Yan wrote: >>>>> Hi Stuart >>>>> Thank you for your review and suggestion. >>>>> Yes, since this failure mode is very hard to be reproduced. I guess it's >>>>> make sense to do some hack. And I also noticed in >>>>> ActivationLibrary.rmidRunning. It does try to look up ActivationSystem but >>>>> doesn't check if it's null. So I add the logic to make sure we will look >>>>> up >>>>> the non-null ActivationSystem. Also I did some cleanup if you don't mind. >>>>> Add a waitFor(long timeout, TimeUnit unit) for JavaVM. Which we can have a >>>>> better waitFor control. >>>>> I appreciate you can review the code again. >>>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.01/ >>>>> Thank you >>>>> Tristan >>>>> >>>>> >>>>> On 01/25/2014 10:20 AM, Stuart Marks wrote: >>>>>> On 1/23/14 10:34 PM, Tristan Yan wrote: >>>>>>> Hi All >>>>>>> Could you review the bug fix for JDK-8032050. >>>>>>> >>>>>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.00/ >>>>>>> >>>>>>> Description: >>>>>>> This rare happened failure caused because when RMID starts. It don't >>>>>>> guarantee >>>>>>> sun.rmi.server.Activation.startActivation finishes. >>>>>>> Fix by adding a iterative getSystem with a 5 seconds timeout. >>>>>> >>>>>> Hi Tristan, >>>>>> >>>>>> Adding a timing/retry loop into this test isn't the correct approach for >>>>>> fixing this test. >>>>>> >>>>>> The timing loop isn't necessary because there is already a timing loop in >>>>>> RMID.start() in the RMI test library. (There's another timing loop in >>>>>> ActivationLibrary.rmidRunning() which should probably be removed.) So the >>>>>> intent of this library call is that it start rmid and wait for it to >>>>>> become >>>>>> ready. That logic doesn't need to be added to the test. >>>>>> >>>>>> In the bug report JDK-8032050 you had mentioned that the >>>>>> NullPointerException was suspicious. You're right! I took a look and it >>>>>> seemed like it was related to JDK-8023541, and I added a note to this >>>>>> effect to the bug report. The problem here is that rmid can come up and >>>>>> transiently return null instead of the stub of the activation system. >>>>>> That's what JDK-8023541 covers. I think that rmid itself needs to be >>>>>> fixed, >>>>>> though modifying the timing loop in the RMI test library to wait for rmid >>>>>> to come up *and* for the lookup to return non-null is an easy way to fix >>>>>> the problem. (Or at least cover it up.) >>>>>> >>>>>> The next step in the analysis is to determine, given that >>>>>> ActivationLibrary.getSystem can sometimes return null, whether this has >>>>>> actually caused this test failure. This is pretty easy to determine; just >>>>>> hack in a line "system = null" in the right place and run the test. I've >>>>>> done this, and the test times out and the output log is pretty much >>>>>> identical to the one in the bug report. (I recommend you try this >>>>>> yourself.) So I think it's fairly safe to say that the problem in >>>>>> JDK-8023541 has caused the failure listed in JDK-8032050. >>>>>> >>>>>> I can see a couple ways to proceed here. One way is just to close this >>>>>> out >>>>>> as a duplicate of JDK-8023541 since that bug caused this failure. >>>>>> >>>>>> Another is that this test could use some cleaning up. This bug certainly >>>>>> covers a failure, but the messages emitted are confusing and in some >>>>>> cases >>>>>> completely wrong. For example, the "rmid has shutdown" message at line >>>>>> 180 >>>>>> is incorrect, because in this case rmid is still running and the wait() >>>>>> call has timed out. Most of the code here can be replaced with calls to >>>>>> various bits of the RMI test library. There are a bunch of other things >>>>>> in >>>>>> this test that could be cleaned up as well. >>>>>> >>>>>> It's up to you how you'd like to proceed. >>>>>> >>>>>> s'marks >>>>> >>>> >>> >>