Thank you for your thorough mail. This is very educational. I took you advice 
and generated a new webrev for this.

http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.03/
I appreciate you can review this again.
Regards
Tristan


On Feb 11, 2014, at 8:32 AM, Stuart Marks <stuart.ma...@oracle.com> wrote:

> Hi Tristan,
> 
> Sorry about my recurring delays.
> 
> Several comments on these changes.
> 
> JavaVM.java --
> 
> The waitFor(timeout) method is mostly ok. The new thread started at line 208 
> and following seems unnecessary, though. This code is reached when a timeout 
> occurs, so throwing TimeoutException is the only thing necessary in this 
> case. Should the code to start the new thread be removed?
> 
> There should be a similar check for vm == null as in the waitFor() [no args] 
> method.
> 
> ShutdownGracefully.java --
> 
> The condition that's checked after calling rmid.waitFor(SHUTDOWN_TIMEOUT) is 
> incorrect. It's testing the exit status against zero. Offhand, when and if 
> rmid exits, it might exit with a nonzero exit status. If rmid has exited at 
> this point, then the test should succeed.
> 
> Instead of testing against zero, the code should catch TimeoutException, 
> which means that rmid is still running. It's probably reasonable to catch 
> TimeoutException, print a message, and then let the finally-block destroy the 
> rmid. Calling TestLibrary.bomb() from within the try-block is confusing, 
> since that method throws an exception, which is then caught by the 
> catch-block, when then calls TestLibrary.bomb() again.
> 
> We should also make sure to test the success case properly. If rmid.waitFor() 
> returns in a timely fashion without throwing TimeoutException, it doesn't 
> matter what the exit status is. (It might be worth printing it out.) At that 
> point we know that rmid *has* exited gracefully, so we need to set rmid to 
> null so that the finally-block doesn't attempt to destroy rmid redundantly. 
> Some additional messages about rmid having exited and the test passing are 
> also warranted for this case.
> 
> Some additional cleanup can be done here as well, over and above the changes 
> you've proposed. (This stuff is left over from earlier RMI test messes.) In 
> order to shut down an active object, the code here spawns a new thread that 
> sleeps for a while and then deactivates this object. This isn't necessary. 
> (It might have been necessary in the past.) It's sufficient simply to 
> unexport this object and then deactivate it, directly within the shutdown() 
> method. See
> 
> test/java/rmi/activation/ActivationSystem/unregisterGroup/UnregisterGroup.java
> 
> for an example of this. In addition, the run() method can be removed, and the 
> "implements Runnable" declaration can also be removed from the 
> ShutdownGracefully test class.
> 
> Finally, revisiting some code farther up in the test, the try-block at lines 
> 135-140 issues a registration request that the test expects to fail. If it 
> succeeds, the message at line 139 isn't very clear; it should say that the 
> registration request succeeded unexpectedly. This should cause the test to 
> fail. We still probably want to go through the waitFor(timeout) path and 
> eventual rmid cleanup, but a flag should be set here to ensure that the test 
> indeed fails if the registration succeeds unexpectedly, and the messages 
> should clearly indicate that this is going on.
> 
> A good way to test this last case is to change rmid's security manager to the 
> normal security manager java.lang.SecurityManager instead of 
> TestSecurityManager.
> 
> Thanks,
> 
> s'marks
> 
> 
> 
> 
> On 2/10/14 2:59 AM, Tristan Yan wrote:
>> Hi Stuart
>> Could you help to review this.
>> Thank you
>> Tristan
>> 
>> On Jan 31, 2014, at 4:36 PM, Tristan Yan <tristan....@oracle.com
>> <mailto:tristan....@oracle.com>> wrote:
>> 
>>> Thank you for fixing JDK-8023541. Then I leave ActivationLibrary.java for 
>>> now.
>>> I still did some clean up following your suggestion.
>>> 1. I changed waitFor(long timeout) method, this method is going to use code
>>> like Process.waitFor(timeout, unit). This can be backported to JDK7. Also
>>> exitValue is kept as a return value. For making sure there is no Pipe leak, 
>>> a
>>> cleanup thread will start when timeout happens.
>>> 2. Change in ShutdownGracefully is a little tricky. I think we should just
>>> destroy JVM once exception is thrown. So I move the wait logic into try 
>>> block
>>> instead keep them in finally block.
>>> Can you receive it again.
>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.02/
>>> Thank you
>>> Tristan
>>> 
>>> On 01/29/2014 03:16 PM, Stuart Marks wrote:
>>>> Hi Tristan,
>>>> 
>>>> I don't want to put the workaround into ActivationLibrary.rmidRunning() 
>>>> for a
>>>> null return from the lookup, because this is only a workaround for an 
>>>> actual
>>>> bug in rmid initialization. See the review I just posted for JDK-8023541.
>>>> 
>>>> Adding JavaVM.waitFor(timeout) is something that would be useful in 
>>>> general,
>>>> but it needs to be handled carefully. It uses the new
>>>> Process.waitFor(timeout, unit) which is new in Java SE 8; this makes
>>>> backporting to 7 more complicated. Not clear whether we'll do so, but I 
>>>> don't
>>>> want to forclose the opportunity without discussion. It's also not clear 
>>>> how
>>>> one can get the vm's exit status after JavaVM.waitFor() has returned true.
>>>> With the Process API it's possible simply to call waitFor() or exitValue().
>>>> With JavaVM, a new API needs to be created, or the rule has to be 
>>>> established
>>>> that one must call JavaVM.waitFor() to collect the exit status as well as 
>>>> to
>>>> close the pipes from the subprocess. If JavaVM.waitFor(timeout, unit) is
>>>> called without subsequently calling JavaVM.waitFor(), the pipes are leaked.
>>>> 
>>>> In ShutdownGracefully.java, the finally-block needs to check to see if rmid
>>>> is still running, and if it is, to shut it down. Simply calling
>>>> waitFor(timeout, unit) isn't sufficient, because if the rmid process is 
>>>> still
>>>> running, it will be left running.
>>>> 
>>>> The straightforward approach would be to call 
>>>> ActivationLibrary.rmidRunning()
>>>> to test if it's still running. Unfortunately this isn't quite right, 
>>>> because
>>>> rmidRunning() has a timeout loop in it -- which should probably be removed.
>>>> (I think there's a bug for this.) Another approach would be simply to call
>>>> rmid.destroy(). This calls rmid's shutdown() method first, which is
>>>> reasonable, but I'm not sure it kills the process if that fails. In any 
>>>> case,
>>>> this already has a timeout loop waiting for the process to die, so
>>>> ShutdownGracefully.java needn't use a new waitFor(timeout, unit) call.
>>>> 
>>>> Removing the commented-out code that starts with "no longer needed" is 
>>>> good,
>>>> and removing the ShutdownDetectThread is also good, since that's 
>>>> unnecessary.
>>>> 
>>>> There are some more cleanups I have in mind here but I'd like to see a
>>>> revised webrev before proceeding.
>>>> 
>>>> Thanks,
>>>> 
>>>> s'marks
>>>> 
>>>> On 1/25/14 8:57 PM, Tristan Yan wrote:
>>>>> Hi Stuart
>>>>> Thank you for your review and suggestion.
>>>>> Yes, since this failure mode is very hard to be reproduced. I guess it's
>>>>> make sense  to do some hack. And I also noticed in
>>>>> ActivationLibrary.rmidRunning. It does try to look up ActivationSystem but
>>>>> doesn't check if it's null. So I add the logic to make sure we will look 
>>>>> up
>>>>> the non-null ActivationSystem. Also I did some cleanup if you don't mind.
>>>>> Add a waitFor(long timeout, TimeUnit unit) for JavaVM. Which we can have a
>>>>> better waitFor control.
>>>>> I appreciate you can review the code again.
>>>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.01/
>>>>> Thank you
>>>>> Tristan
>>>>> 
>>>>> 
>>>>> On 01/25/2014 10:20 AM, Stuart Marks wrote:
>>>>>> On 1/23/14 10:34 PM, Tristan Yan wrote:
>>>>>>> Hi All
>>>>>>> Could you review the bug fix for JDK-8032050.
>>>>>>> 
>>>>>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.00/
>>>>>>> 
>>>>>>> Description:
>>>>>>> This rare happened failure caused because when RMID starts. It don't
>>>>>>> guarantee
>>>>>>> sun.rmi.server.Activation.startActivation finishes.
>>>>>>> Fix by adding a iterative getSystem with a 5 seconds timeout.
>>>>>> 
>>>>>> Hi Tristan,
>>>>>> 
>>>>>> Adding a timing/retry loop into this test isn't the correct approach for
>>>>>> fixing this test.
>>>>>> 
>>>>>> The timing loop isn't necessary because there is already a timing loop in
>>>>>> RMID.start() in the RMI test library. (There's another timing loop in
>>>>>> ActivationLibrary.rmidRunning() which should probably be removed.) So the
>>>>>> intent of this library call is that it start rmid and wait for it to 
>>>>>> become
>>>>>> ready. That logic doesn't need to be added to the test.
>>>>>> 
>>>>>> In the bug report JDK-8032050 you had mentioned that the
>>>>>> NullPointerException was suspicious. You're right! I took a look and it
>>>>>> seemed like it was related to JDK-8023541, and I added a note to this
>>>>>> effect to the bug report. The problem here is that rmid can come up and
>>>>>> transiently return null instead of the stub of the activation system.
>>>>>> That's what JDK-8023541 covers. I think that rmid itself needs to be 
>>>>>> fixed,
>>>>>> though modifying the timing loop in the RMI test library to wait for rmid
>>>>>> to come up *and* for the lookup to return non-null is an easy way to fix
>>>>>> the problem. (Or at least cover it up.)
>>>>>> 
>>>>>> The next step in the analysis is to determine, given that
>>>>>> ActivationLibrary.getSystem can sometimes return null, whether this has
>>>>>> actually caused this test failure. This is pretty easy to determine; just
>>>>>> hack in a line "system = null" in the right place and run the test. I've
>>>>>> done this, and the test times out and the output log is pretty much
>>>>>> identical to the one in the bug report. (I recommend you try this
>>>>>> yourself.) So I think it's fairly safe to say that the problem in
>>>>>> JDK-8023541 has caused the failure listed in JDK-8032050.
>>>>>> 
>>>>>> I can see a couple ways to proceed here. One way is just to close this 
>>>>>> out
>>>>>> as a duplicate of JDK-8023541 since that bug caused this failure.
>>>>>> 
>>>>>> Another is that this test could use some cleaning up. This bug certainly
>>>>>> covers a failure, but the messages emitted are confusing and in some 
>>>>>> cases
>>>>>> completely wrong. For example, the "rmid has shutdown" message at line 
>>>>>> 180
>>>>>> is incorrect, because in this case rmid is still running and the wait()
>>>>>> call has timed out. Most of the code here can be replaced with calls to
>>>>>> various bits of the RMI test library. There are a bunch of other things 
>>>>>> in
>>>>>> this test that could be cleaned up as well.
>>>>>> 
>>>>>> It's up to you how you'd like to proceed.
>>>>>> 
>>>>>> s'marks
>>>>> 
>>>> 
>>> 
>> 

Reply via email to