Hi Stuart Could you help to review this. Thank you Tristan On Jan 31, 2014, at 4:36 PM, Tristan Yan <tristan....@oracle.com> wrote:
> Thank you for fixing JDK-8023541. Then I leave ActivationLibrary.java for now. > I still did some clean up following your suggestion. > 1. I changed waitFor(long timeout) method, this method is going to use code > like Process.waitFor(timeout, unit). This can be backported to JDK7. Also > exitValue is kept as a return value. For making sure there is no Pipe leak, a > cleanup thread will start when timeout happens. > 2. Change in ShutdownGracefully is a little tricky. I think we should just > destroy JVM once exception is thrown. So I move the wait logic into try block > instead keep them in finally block. > Can you receive it again. > http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.02/ > Thank you > Tristan > > On 01/29/2014 03:16 PM, Stuart Marks wrote: >> Hi Tristan, >> >> I don't want to put the workaround into ActivationLibrary.rmidRunning() for >> a null return from the lookup, because this is only a workaround for an >> actual bug in rmid initialization. See the review I just posted for >> JDK-8023541. >> >> Adding JavaVM.waitFor(timeout) is something that would be useful in general, >> but it needs to be handled carefully. It uses the new >> Process.waitFor(timeout, unit) which is new in Java SE 8; this makes >> backporting to 7 more complicated. Not clear whether we'll do so, but I >> don't want to forclose the opportunity without discussion. It's also not >> clear how one can get the vm's exit status after JavaVM.waitFor() has >> returned true. With the Process API it's possible simply to call waitFor() >> or exitValue(). With JavaVM, a new API needs to be created, or the rule has >> to be established that one must call JavaVM.waitFor() to collect the exit >> status as well as to close the pipes from the subprocess. If >> JavaVM.waitFor(timeout, unit) is called without subsequently calling >> JavaVM.waitFor(), the pipes are leaked. >> >> In ShutdownGracefully.java, the finally-block needs to check to see if rmid >> is still running, and if it is, to shut it down. Simply calling >> waitFor(timeout, unit) isn't sufficient, because if the rmid process is >> still running, it will be left running. >> >> The straightforward approach would be to call >> ActivationLibrary.rmidRunning() to test if it's still running. Unfortunately >> this isn't quite right, because rmidRunning() has a timeout loop in it -- >> which should probably be removed. (I think there's a bug for this.) Another >> approach would be simply to call rmid.destroy(). This calls rmid's >> shutdown() method first, which is reasonable, but I'm not sure it kills the >> process if that fails. In any case, this already has a timeout loop waiting >> for the process to die, so ShutdownGracefully.java needn't use a new >> waitFor(timeout, unit) call. >> >> Removing the commented-out code that starts with "no longer needed" is good, >> and removing the ShutdownDetectThread is also good, since that's unnecessary. >> >> There are some more cleanups I have in mind here but I'd like to see a >> revised webrev before proceeding. >> >> Thanks, >> >> s'marks >> >> On 1/25/14 8:57 PM, Tristan Yan wrote: >>> Hi Stuart >>> Thank you for your review and suggestion. >>> Yes, since this failure mode is very hard to be reproduced. I guess it's >>> make sense to do some hack. And I also noticed in >>> ActivationLibrary.rmidRunning. It does try to look up ActivationSystem but >>> doesn't check if it's null. So I add the logic to make sure we will look up >>> the non-null ActivationSystem. Also I did some cleanup if you don't mind. >>> Add a waitFor(long timeout, TimeUnit unit) for JavaVM. Which we can have a >>> better waitFor control. >>> I appreciate you can review the code again. >>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.01/ >>> Thank you >>> Tristan >>> >>> >>> On 01/25/2014 10:20 AM, Stuart Marks wrote: >>>> On 1/23/14 10:34 PM, Tristan Yan wrote: >>>>> Hi All >>>>> Could you review the bug fix for JDK-8032050. >>>>> >>>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.00/ >>>>> >>>>> Description: >>>>> This rare happened failure caused because when RMID starts. It don't >>>>> guarantee >>>>> sun.rmi.server.Activation.startActivation finishes. >>>>> Fix by adding a iterative getSystem with a 5 seconds timeout. >>>> >>>> Hi Tristan, >>>> >>>> Adding a timing/retry loop into this test isn't the correct approach for >>>> fixing this test. >>>> >>>> The timing loop isn't necessary because there is already a timing loop in >>>> RMID.start() in the RMI test library. (There's another timing loop in >>>> ActivationLibrary.rmidRunning() which should probably be removed.) So the >>>> intent of this library call is that it start rmid and wait for it to >>>> become ready. That logic doesn't need to be added to the test. >>>> >>>> In the bug report JDK-8032050 you had mentioned that the >>>> NullPointerException was suspicious. You're right! I took a look and it >>>> seemed like it was related to JDK-8023541, and I added a note to this >>>> effect to the bug report. The problem here is that rmid can come up and >>>> transiently return null instead of the stub of the activation system. >>>> That's what JDK-8023541 covers. I think that rmid itself needs to be >>>> fixed, though modifying the timing loop in the RMI test library to wait >>>> for rmid to come up *and* for the lookup to return non-null is an easy way >>>> to fix the problem. (Or at least cover it up.) >>>> >>>> The next step in the analysis is to determine, given that >>>> ActivationLibrary.getSystem can sometimes return null, whether this has >>>> actually caused this test failure. This is pretty easy to determine; just >>>> hack in a line "system = null" in the right place and run the test. I've >>>> done this, and the test times out and the output log is pretty much >>>> identical to the one in the bug report. (I recommend you try this >>>> yourself.) So I think it's fairly safe to say that the problem in >>>> JDK-8023541 has caused the failure listed in JDK-8032050. >>>> >>>> I can see a couple ways to proceed here. One way is just to close this out >>>> as a duplicate of JDK-8023541 since that bug caused this failure. >>>> >>>> Another is that this test could use some cleaning up. This bug certainly >>>> covers a failure, but the messages emitted are confusing and in some cases >>>> completely wrong. For example, the "rmid has shutdown" message at line 180 >>>> is incorrect, because in this case rmid is still running and the wait() >>>> call has timed out. Most of the code here can be replaced with calls to >>>> various bits of the RMI test library. There are a bunch of other things in >>>> this test that could be cleaned up as well. >>>> >>>> It's up to you how you'd like to proceed. >>>> >>>> s'marks >>> >> >