Hi Stuart
Could you help to review this.
Thank you
Tristan

On Jan 31, 2014, at 4:36 PM, Tristan Yan <tristan....@oracle.com> wrote:

> Thank you for fixing JDK-8023541. Then I leave ActivationLibrary.java for now.
> I still did some clean up following your suggestion.
> 1. I changed waitFor(long timeout) method, this method is going to use code 
> like Process.waitFor(timeout, unit). This can be backported to JDK7. Also 
> exitValue is kept as a return value. For making sure there is no Pipe leak, a 
> cleanup thread will start when timeout happens.
> 2. Change in ShutdownGracefully is a little tricky. I think we should just 
> destroy JVM once exception is thrown. So I move the wait logic into try block 
> instead keep them in finally block. 
> Can you receive it again.
> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.02/
> Thank you
> Tristan
> 
> On 01/29/2014 03:16 PM, Stuart Marks wrote:
>> Hi Tristan,
>> 
>> I don't want to put the workaround into ActivationLibrary.rmidRunning() for 
>> a null return from the lookup, because this is only a workaround for an 
>> actual bug in rmid initialization. See the review I just posted for 
>> JDK-8023541.
>> 
>> Adding JavaVM.waitFor(timeout) is something that would be useful in general, 
>> but it needs to be handled carefully. It uses the new 
>> Process.waitFor(timeout, unit) which is new in Java SE 8; this makes 
>> backporting to 7 more complicated. Not clear whether we'll do so, but I 
>> don't want to forclose the opportunity without discussion. It's also not 
>> clear how one can get the vm's exit status after JavaVM.waitFor() has 
>> returned true. With the Process API it's possible simply to call waitFor() 
>> or exitValue(). With JavaVM, a new API needs to be created, or the rule has 
>> to be established that one must call JavaVM.waitFor() to collect the exit 
>> status as well as to close the pipes from the subprocess. If 
>> JavaVM.waitFor(timeout, unit) is called without subsequently calling 
>> JavaVM.waitFor(), the pipes are leaked.
>> 
>> In ShutdownGracefully.java, the finally-block needs to check to see if rmid 
>> is still running, and if it is, to shut it down. Simply calling 
>> waitFor(timeout, unit) isn't sufficient, because if the rmid process is 
>> still running, it will be left running.
>> 
>> The straightforward approach would be to call 
>> ActivationLibrary.rmidRunning() to test if it's still running. Unfortunately 
>> this isn't quite right, because rmidRunning() has a timeout loop in it -- 
>> which should probably be removed. (I think there's a bug for this.) Another 
>> approach would be simply to call rmid.destroy(). This calls rmid's 
>> shutdown() method first, which is reasonable, but I'm not sure it kills the 
>> process if that fails. In any case, this already has a timeout loop waiting 
>> for the process to die, so ShutdownGracefully.java needn't use a new 
>> waitFor(timeout, unit) call.
>> 
>> Removing the commented-out code that starts with "no longer needed" is good, 
>> and removing the ShutdownDetectThread is also good, since that's unnecessary.
>> 
>> There are some more cleanups I have in mind here but I'd like to see a 
>> revised webrev before proceeding.
>> 
>> Thanks,
>> 
>> s'marks
>> 
>> On 1/25/14 8:57 PM, Tristan Yan wrote:
>>> Hi Stuart
>>> Thank you for your review and suggestion.
>>> Yes, since this failure mode is very hard to be reproduced. I guess it's 
>>> make sense  to do some hack. And I also noticed in 
>>> ActivationLibrary.rmidRunning. It does try to look up ActivationSystem but 
>>> doesn't check if it's null. So I add the logic to make sure we will look up 
>>> the non-null ActivationSystem. Also I did some cleanup if you don't mind. 
>>> Add a waitFor(long timeout, TimeUnit unit) for JavaVM. Which we can have a 
>>> better waitFor control.
>>> I appreciate you can review the code again.
>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.01/
>>> Thank you
>>> Tristan
>>> 
>>> 
>>> On 01/25/2014 10:20 AM, Stuart Marks wrote:
>>>> On 1/23/14 10:34 PM, Tristan Yan wrote: 
>>>>> Hi All 
>>>>> Could you review the bug fix for JDK-8032050. 
>>>>> 
>>>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.00/ 
>>>>> 
>>>>> Description: 
>>>>> This rare happened failure caused because when RMID starts. It don't 
>>>>> guarantee 
>>>>> sun.rmi.server.Activation.startActivation finishes. 
>>>>> Fix by adding a iterative getSystem with a 5 seconds timeout. 
>>>> 
>>>> Hi Tristan, 
>>>> 
>>>> Adding a timing/retry loop into this test isn't the correct approach for 
>>>> fixing this test. 
>>>> 
>>>> The timing loop isn't necessary because there is already a timing loop in 
>>>> RMID.start() in the RMI test library. (There's another timing loop in 
>>>> ActivationLibrary.rmidRunning() which should probably be removed.) So the 
>>>> intent of this library call is that it start rmid and wait for it to 
>>>> become ready. That logic doesn't need to be added to the test. 
>>>> 
>>>> In the bug report JDK-8032050 you had mentioned that the 
>>>> NullPointerException was suspicious. You're right! I took a look and it 
>>>> seemed like it was related to JDK-8023541, and I added a note to this 
>>>> effect to the bug report. The problem here is that rmid can come up and 
>>>> transiently return null instead of the stub of the activation system. 
>>>> That's what JDK-8023541 covers. I think that rmid itself needs to be 
>>>> fixed, though modifying the timing loop in the RMI test library to wait 
>>>> for rmid to come up *and* for the lookup to return non-null is an easy way 
>>>> to fix the problem. (Or at least cover it up.) 
>>>> 
>>>> The next step in the analysis is to determine, given that 
>>>> ActivationLibrary.getSystem can sometimes return null, whether this has 
>>>> actually caused this test failure. This is pretty easy to determine; just 
>>>> hack in a line "system = null" in the right place and run the test. I've 
>>>> done this, and the test times out and the output log is pretty much 
>>>> identical to the one in the bug report. (I recommend you try this 
>>>> yourself.) So I think it's fairly safe to say that the problem in 
>>>> JDK-8023541 has caused the failure listed in JDK-8032050. 
>>>> 
>>>> I can see a couple ways to proceed here. One way is just to close this out 
>>>> as a duplicate of JDK-8023541 since that bug caused this failure. 
>>>> 
>>>> Another is that this test could use some cleaning up. This bug certainly 
>>>> covers a failure, but the messages emitted are confusing and in some cases 
>>>> completely wrong. For example, the "rmid has shutdown" message at line 180 
>>>> is incorrect, because in this case rmid is still running and the wait() 
>>>> call has timed out. Most of the code here can be replaced with calls to 
>>>> various bits of the RMI test library. There are a bunch of other things in 
>>>> this test that could be cleaned up as well. 
>>>> 
>>>> It's up to you how you'd like to proceed. 
>>>> 
>>>> s'marks 
>>> 
>> 
> 

Reply via email to