Thank you Stuart This is nice. I thought two variables were weird but I didn’t figure out the right solution. Also I wasn’t so sure why we print out so many messages now it’s clear to me. http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.05/ I’m sorry I have to ask you review this again. regards Tristan
On Feb 15, 2014, at 9:47 AM, Stuart Marks <stuart.ma...@oracle.com> wrote: > Hi Tristan, > > OK, we're getting close. Just a couple things about ShutdownGracefully.java. > > It's a bit clumsy to have both a boolean and a message string to keep track > of the state of the test, but by itself this isn't such a big deal. > > A bigger deal is the lack of exception chaining. If we catch an unexpected > exception at line 162, its class and message are printed out, but its stack > trace isn't. If this were to occur it might be fiendishly difficult to debug. > > The TimeoutException isn't chained up either, but this isn't so bad, since > there's only one place the timeout can occur. Still, it's good to chain > exceptions where possible. > > There's another failure case that doesn't have an exception at all, which is > when the registration request we're expecting to fail actually succeeds. This > doesn't have an exception, but we should consider creating one. > > Here's an approach to getting rid of the boolean+string and also chaining up > exceptions. The key insight is that an exception can be created in a > different place from where it's thrown. > > Instead of the boolean+stream, have a variable of type Exception that's > initialized to null. Anyplace where we catch an exception that indicates > failure, fill in this variable. The idea is that if this exception variable > is non-null, a failure has occurred. The exception being non-null replaces > the boolean, and the exception message replaces the failure string. In > addition, the exception has a stack trace, which is essential for debugging. > > For the case where we don't get the expected exception (i.e., registration > succeeds unexpectedly), simply set the exception variable to a newly created > exception, but don't throw it yet. For example, > > exception = new RuntimeException( > "The registration request succeeded unexpectedly"); > > At the place where we catch an unexpected exception, wrap the caught > exception in a new RuntimeException with a message like "caught unexpected > exception". The reason to do this is so we can add an additional message. The > stack trace will also be a bit easier to follow. > > For the timeout, just assign the TimeoutException to the exception variable. > > Also, at each location where the exception variable is assigned to, print out > a message at that point. It will at least show the state of the test to be a > failure. The reason is that, if rmid.destroy() were to throw an exception > from the finally-block, our carefully recorded exception state will be thrown > away. (An alternative would be to put this into its own try-block, and add > any exceptions caught from it to the suppressed exception list of the > exception variable, but it's not clear to me that this is worth it.) > > At the end of the test, if the exception variable is non-null, call > TestLibrary.bomb() with it to fail the test. > > Finally, one typo: "prcoess". > > Thanks for updating this webrev again. > > s'marks > > On 2/13/14 12:25 AM, Tristan Yan wrote: >> Thank you Stuart >> I have fixed comment in JavaVM.java. Dealing with different cases in >> ShutdownGracefully.java, two variables were added. One is a flag indicate >> test >> passed or not. Other variable keeps the error message when test failed. I put >> TestLibrary.bomb in the bottom of the main method which only shows test fail >> message. >> Could you review it again >> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.04/ >> Tristan >> >> On 02/13/2014 05:29 AM, Stuart Marks wrote: >>> Hi Tristan, >>> >>> JavaVM.waitFor looks mostly fine. The indentation of the start of the >>> waitFor(timeout) javadoc comment is a bit off, though; please fix. >>> >>> There are still some adjustments that need to be made in >>> ShutdownGracefully.java. Both have to do with the case where the last >>> registration succeeds unexpectedly -- this is the one that we expect to >>> fail. >>> >>> First, if this registration has succeeded unexpectedly, that means rmid is >>> still running. If that occurs, the call to rmid.waitFor(timeout) will >>> inevitably time out. It may be worth calling rmid.destroy() directly at this >>> point. >>> >>> Second, still in the succeeded-unexpectedly case, at line 154 >>> TestLibrary.bomb() is called. This throws an exception, but it's caught by >>> the >>> catch-block at lines 157-158, which calls TestLibrary.bomb() again, saying >>> "unexpected exception". Except that this is kind of expected, since it was >>> thrown from an earlier call to TestLibrary.bomb(). This is quite confusing. >>> >>> There are several cases that need to be handled. >>> >>> 1. Normal case. Registration fails as expected, rmid has terminated >>> gracefully. Test passes. >>> >>> 2. Rmid is still running and has processed the registration request >>> successfully. Need to kill rmid and fail the test. >>> >>> 3. Rmid is still running but is in some bad state where the registration >>> request failed. Need to kill rmid and fail the test. >>> >>> 4. Some other unexpected failure. This is what the catch and finally blocks >>> at >>> lines 157-161 are for. >>> >>> These four cases need to be handled independently. Ideally they should be >>> separated from the cleanup code. As noted above, you don't want to throw an >>> exception from the try-block, because it will get caught by your own catch >>> block. Similarly, it's tempting to return from the midst of the try-block in >>> the success case, but this still runs the finally-block. This can be quite >>> confusing. >>> >>> A typical technique for dealing with this kind of issue is to record results >>> of operations from within the try block, and then analyze the results >>> outside, >>> throwing a test failure (TestLibrary.bomb) at that point, where it won't be >>> caught by the test's own catch block. >>> >>> Editoral: >>> - line 153, there should be a space between 'if' and the opening paren >>> - line 156, typo, "gracefuuly" >>> >>> Finally, it would be helpful if you could get webrev to generate the actual >>> changeset instead of the plain patch, per my other review email. >>> >>> Thanks, >>> >>> s'marks >>> >>> >>> On 2/11/14 9:39 PM, Tristan Yan wrote: >>>> Thank you for your thorough mail. This is very educational. I took you >>>> advice >>>> and generated a new webrev for this. >>>> >>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.03/ >>>> I appreciate you can review this again. >>>> Regards >>>> Tristan >>>> >>>> >>>> On Feb 11, 2014, at 8:32 AM, Stuart Marks <stuart.ma...@oracle.com >>>> <mailto:stuart.ma...@oracle.com>> wrote: >>>> >>>>> Hi Tristan, >>>>> >>>>> Sorry about my recurring delays. >>>>> >>>>> Several comments on these changes. >>>>> >>>>> JavaVM.java -- >>>>> >>>>> The waitFor(timeout) method is mostly ok. The new thread started at line >>>>> 208 >>>>> and following seems unnecessary, though. This code is reached when a >>>>> timeout >>>>> occurs, so throwing TimeoutException is the only thing necessary in this >>>>> case. >>>>> Should the code to start the new thread be removed? >>>>> >>>>> There should be a similar check for vm == null as in the waitFor() [no >>>>> args] >>>>> method. >>>>> >>>>> ShutdownGracefully.java -- >>>>> >>>>> The condition that's checked after calling rmid.waitFor(SHUTDOWN_TIMEOUT) >>>>> is >>>>> incorrect. It's testing the exit status against zero. Offhand, when and if >>>>> rmid exits, it might exit with a nonzero exit status. If rmid has exited >>>>> at >>>>> this point, then the test should succeed. >>>>> >>>>> Instead of testing against zero, the code should catch TimeoutException, >>>>> which >>>>> means that rmid is still running. It's probably reasonable to catch >>>>> TimeoutException, print a message, and then let the finally-block destroy >>>>> the >>>>> rmid. Calling TestLibrary.bomb() from within the try-block is confusing, >>>>> since >>>>> that method throws an exception, which is then caught by the catch-block, >>>>> when >>>>> then calls TestLibrary.bomb() again. >>>>> >>>>> We should also make sure to test the success case properly. If >>>>> rmid.waitFor() >>>>> returns in a timely fashion without throwing TimeoutException, it doesn't >>>>> matter what the exit status is. (It might be worth printing it out.) At >>>>> that >>>>> point we know that rmid *has* exited gracefully, so we need to set rmid to >>>>> null so that the finally-block doesn't attempt to destroy rmid >>>>> redundantly. >>>>> Some additional messages about rmid having exited and the test passing are >>>>> also warranted for this case. >>>>> >>>>> Some additional cleanup can be done here as well, over and above the >>>>> changes >>>>> you've proposed. (This stuff is left over from earlier RMI test messes.) >>>>> In >>>>> order to shut down an active object, the code here spawns a new thread >>>>> that >>>>> sleeps for a while and then deactivates this object. This isn't >>>>> necessary. (It >>>>> might have been necessary in the past.) It's sufficient simply to unexport >>>>> this object and then deactivate it, directly within the shutdown() >>>>> method. See >>>>> >>>>> test/java/rmi/activation/ActivationSystem/unregisterGroup/UnregisterGroup.java >>>>> >>>>> for an example of this. In addition, the run() method can be removed, and >>>>> the >>>>> "implements Runnable" declaration can also be removed from the >>>>> ShutdownGracefully test class. >>>>> >>>>> Finally, revisiting some code farther up in the test, the try-block at >>>>> lines >>>>> 135-140 issues a registration request that the test expects to fail. If it >>>>> succeeds, the message at line 139 isn't very clear; it should say that the >>>>> registration request succeeded unexpectedly. This should cause the test to >>>>> fail. We still probably want to go through the waitFor(timeout) path and >>>>> eventual rmid cleanup, but a flag should be set here to ensure that the >>>>> test >>>>> indeed fails if the registration succeeds unexpectedly, and the messages >>>>> should clearly indicate that this is going on. >>>>> >>>>> A good way to test this last case is to change rmid's security manager to >>>>> the >>>>> normal security manager java.lang.SecurityManager instead of >>>>> TestSecurityManager. >>>>> >>>>> Thanks, >>>>> >>>>> s'marks >>>>> >>>>> >>>>> >>>>> >>>>> On 2/10/14 2:59 AM, Tristan Yan wrote: >>>>>> Hi Stuart >>>>>> Could you help to review this. >>>>>> Thank you >>>>>> Tristan >>>>>> >>>>>> On Jan 31, 2014, at 4:36 PM, Tristan Yan <tristan....@oracle.com >>>>>> <mailto:tristan....@oracle.com> >>>>>> <mailto:tristan....@oracle.com>> wrote: >>>>>> >>>>>>> Thank you for fixing JDK-8023541. Then I leave ActivationLibrary.java >>>>>>> for >>>>>>> now. >>>>>>> I still did some clean up following your suggestion. >>>>>>> 1. I changed waitFor(long timeout) method, this method is going to use >>>>>>> code >>>>>>> like Process.waitFor(timeout, unit). This can be backported to JDK7. >>>>>>> Also >>>>>>> exitValue is kept as a return value. For making sure there is no Pipe >>>>>>> leak, a >>>>>>> cleanup thread will start when timeout happens. >>>>>>> 2. Change in ShutdownGracefully is a little tricky. I think we should >>>>>>> just >>>>>>> destroy JVM once exception is thrown. So I move the wait logic into try >>>>>>> block >>>>>>> instead keep them in finally block. >>>>>>> Can you receive it again. >>>>>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.02/ >>>>>>> Thank you >>>>>>> Tristan >>>>>>> >>>>>>> On 01/29/2014 03:16 PM, Stuart Marks wrote: >>>>>>>> Hi Tristan, >>>>>>>> >>>>>>>> I don't want to put the workaround into ActivationLibrary.rmidRunning() >>>>>>>> for a >>>>>>>> null return from the lookup, because this is only a workaround for an >>>>>>>> actual >>>>>>>> bug in rmid initialization. See the review I just posted for >>>>>>>> JDK-8023541. >>>>>>>> >>>>>>>> Adding JavaVM.waitFor(timeout) is something that would be useful in >>>>>>>> general, >>>>>>>> but it needs to be handled carefully. It uses the new >>>>>>>> Process.waitFor(timeout, unit) which is new in Java SE 8; this makes >>>>>>>> backporting to 7 more complicated. Not clear whether we'll do so, but I >>>>>>>> don't >>>>>>>> want to forclose the opportunity without discussion. It's also not >>>>>>>> clear how >>>>>>>> one can get the vm's exit status after JavaVM.waitFor() has returned >>>>>>>> true. >>>>>>>> With the Process API it's possible simply to call waitFor() or >>>>>>>> exitValue(). >>>>>>>> With JavaVM, a new API needs to be created, or the rule has to be >>>>>>>> established >>>>>>>> that one must call JavaVM.waitFor() to collect the exit status as well >>>>>>>> as to >>>>>>>> close the pipes from the subprocess. If JavaVM.waitFor(timeout, unit) >>>>>>>> is >>>>>>>> called without subsequently calling JavaVM.waitFor(), the pipes are >>>>>>>> leaked. >>>>>>>> >>>>>>>> In ShutdownGracefully.java, the finally-block needs to check to see if >>>>>>>> rmid >>>>>>>> is still running, and if it is, to shut it down. Simply calling >>>>>>>> waitFor(timeout, unit) isn't sufficient, because if the rmid process is >>>>>>>> still >>>>>>>> running, it will be left running. >>>>>>>> >>>>>>>> The straightforward approach would be to call >>>>>>>> ActivationLibrary.rmidRunning() >>>>>>>> to test if it's still running. Unfortunately this isn't quite right, >>>>>>>> because >>>>>>>> rmidRunning() has a timeout loop in it -- which should probably be >>>>>>>> removed. >>>>>>>> (I think there's a bug for this.) Another approach would be simply to >>>>>>>> call >>>>>>>> rmid.destroy(). This calls rmid's shutdown() method first, which is >>>>>>>> reasonable, but I'm not sure it kills the process if that fails. In any >>>>>>>> case, >>>>>>>> this already has a timeout loop waiting for the process to die, so >>>>>>>> ShutdownGracefully.java needn't use a new waitFor(timeout, unit) call. >>>>>>>> >>>>>>>> Removing the commented-out code that starts with "no longer needed" is >>>>>>>> good, >>>>>>>> and removing the ShutdownDetectThread is also good, since that's >>>>>>>> unnecessary. >>>>>>>> >>>>>>>> There are some more cleanups I have in mind here but I'd like to see a >>>>>>>> revised webrev before proceeding. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> s'marks >>>>>>>> >>>>>>>> On 1/25/14 8:57 PM, Tristan Yan wrote: >>>>>>>>> Hi Stuart >>>>>>>>> Thank you for your review and suggestion. >>>>>>>>> Yes, since this failure mode is very hard to be reproduced. I guess >>>>>>>>> it's >>>>>>>>> make sense to do some hack. And I also noticed in >>>>>>>>> ActivationLibrary.rmidRunning. It does try to look up >>>>>>>>> ActivationSystem but >>>>>>>>> doesn't check if it's null. So I add the logic to make sure we will >>>>>>>>> look up >>>>>>>>> the non-null ActivationSystem. Also I did some cleanup if you don't >>>>>>>>> mind. >>>>>>>>> Add a waitFor(long timeout, TimeUnit unit) for JavaVM. Which we can >>>>>>>>> have a >>>>>>>>> better waitFor control. >>>>>>>>> I appreciate you can review the code again. >>>>>>>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.01/ >>>>>>>>> Thank you >>>>>>>>> Tristan >>>>>>>>> >>>>>>>>> >>>>>>>>> On 01/25/2014 10:20 AM, Stuart Marks wrote: >>>>>>>>>> On 1/23/14 10:34 PM, Tristan Yan wrote: >>>>>>>>>>> Hi All >>>>>>>>>>> Could you review the bug fix for JDK-8032050. >>>>>>>>>>> >>>>>>>>>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.00/ >>>>>>>>>>> >>>>>>>>>>> Description: >>>>>>>>>>> This rare happened failure caused because when RMID starts. It don't >>>>>>>>>>> guarantee >>>>>>>>>>> sun.rmi.server.Activation.startActivation finishes. >>>>>>>>>>> Fix by adding a iterative getSystem with a 5 seconds timeout. >>>>>>>>>> >>>>>>>>>> Hi Tristan, >>>>>>>>>> >>>>>>>>>> Adding a timing/retry loop into this test isn't the correct approach >>>>>>>>>> for >>>>>>>>>> fixing this test. >>>>>>>>>> >>>>>>>>>> The timing loop isn't necessary because there is already a timing >>>>>>>>>> loop in >>>>>>>>>> RMID.start() in the RMI test library. (There's another timing loop in >>>>>>>>>> ActivationLibrary.rmidRunning() which should probably be removed.) >>>>>>>>>> So the >>>>>>>>>> intent of this library call is that it start rmid and wait for it to >>>>>>>>>> become >>>>>>>>>> ready. That logic doesn't need to be added to the test. >>>>>>>>>> >>>>>>>>>> In the bug report JDK-8032050 you had mentioned that the >>>>>>>>>> NullPointerException was suspicious. You're right! I took a look and >>>>>>>>>> it >>>>>>>>>> seemed like it was related to JDK-8023541, and I added a note to this >>>>>>>>>> effect to the bug report. The problem here is that rmid can come up >>>>>>>>>> and >>>>>>>>>> transiently return null instead of the stub of the activation system. >>>>>>>>>> That's what JDK-8023541 covers. I think that rmid itself needs to be >>>>>>>>>> fixed, >>>>>>>>>> though modifying the timing loop in the RMI test library to wait for >>>>>>>>>> rmid >>>>>>>>>> to come up *and* for the lookup to return non-null is an easy way to >>>>>>>>>> fix >>>>>>>>>> the problem. (Or at least cover it up.) >>>>>>>>>> >>>>>>>>>> The next step in the analysis is to determine, given that >>>>>>>>>> ActivationLibrary.getSystem can sometimes return null, whether this >>>>>>>>>> has >>>>>>>>>> actually caused this test failure. This is pretty easy to determine; >>>>>>>>>> just >>>>>>>>>> hack in a line "system = null" in the right place and run the test. >>>>>>>>>> I've >>>>>>>>>> done this, and the test times out and the output log is pretty much >>>>>>>>>> identical to the one in the bug report. (I recommend you try this >>>>>>>>>> yourself.) So I think it's fairly safe to say that the problem in >>>>>>>>>> JDK-8023541 has caused the failure listed in JDK-8032050. >>>>>>>>>> >>>>>>>>>> I can see a couple ways to proceed here. One way is just to close >>>>>>>>>> this out >>>>>>>>>> as a duplicate of JDK-8023541 since that bug caused this failure. >>>>>>>>>> >>>>>>>>>> Another is that this test could use some cleaning up. This bug >>>>>>>>>> certainly >>>>>>>>>> covers a failure, but the messages emitted are confusing and in some >>>>>>>>>> cases >>>>>>>>>> completely wrong. For example, the "rmid has shutdown" message at >>>>>>>>>> line 180 >>>>>>>>>> is incorrect, because in this case rmid is still running and the >>>>>>>>>> wait() >>>>>>>>>> call has timed out. Most of the code here can be replaced with calls >>>>>>>>>> to >>>>>>>>>> various bits of the RMI test library. There are a bunch of other >>>>>>>>>> things in >>>>>>>>>> this test that could be cleaned up as well. >>>>>>>>>> >>>>>>>>>> It's up to you how you'd like to proceed. >>>>>>>>>> >>>>>>>>>> s'marks >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>