Re: RFR for JDK-8030057: speed up forceLogSnapshot and checkAnnotations

Stuart Marks Fri, 20 Dec 2013 17:59:39 -0800

Hi Tristan,

First of all, these are two completely separate changes. They're sort-of relatedin that they involve replacing sleep() loops and polling with differentconstructs, but other than that, they're completely independent. As you'll seefrom my review comments, there are different issues going on with each as well.It might be worthwhile to consider splitting these into separate subtasks, onefor each of these changes, so that if one of the changes converges more quickly(seems like) it can go in independently of the others.


CheckAnnotations

This is a pretty complicated test to begin with, and the changes make it evenmore complicated. It adds two threads, a single Lock, two conditions, and streamsubclasses that notify these conditions. It's not clear to me why the low-levelconstructs from j.u.concurrent.locks are necessary. At first glance the idea ofhaving a notifying specialization of a ByteArrayOutputStream is reasonable. I'dexpect the lock and condition to be inside the stream object, though, notexternal to it. In addition, the condition being notified is simply "somethinghas been written". We don't actually know that all the output from thesubprocess has been read. The old version waited around for a while and assumedthat it had collected it all. This of course is racy; but having the notifyingstream is still racy, if it happens that the output is written in multiple chunks.

Note that if we want to get subprocess output as it arrives, there are twothreads hidden inside of RMI's TestLibrary.JavaVM class that collect stdout andstderr from the subprocess. By default they simply write to whatever stream theyare given. An alternative approach would be to plumb an interface to thesethreads and have the output sent directly to the test as it comes in. (I thinkone of the other RMI tests does this, though it does it in a particularly clumsyway that I wouldn't recommend emulating.)

But stepping back a bit, the original test is over-complicated to begin with,and I'm reluctant to add all these additional mechanisms in this proposedchange, because they add more complexity, and also because I can't quiteconvince myself that they're correct or that they even solve the problem thatexists in the original test.

The original test probably needs to be reconceived entirely. The idea is toactivate several objects in several different activation groups (which are childJVMs of the RMID process), and to make sure that the output from each is"annotated" with a prefix that indicates which activation group it came from. Itseems to me that a much simpler way to approach the problem is to activateseveral objects and have them each emit a unique bit of output. Then, shuteverything down, collect ALL the output, and check to make sure that eachobject's unique output is on a line that was annotated properly. This completelyavoids threads, locks, conditions, timing loops, and race conditions.

If you think this alternative approach would be an effective way to proceed, I'dbe happy to help you out with it.


ForceLogSnapshot

This is another fairly complicated test that is partially helped by the additionof CountDownLatches. Basically, it registers 50 activatable+restartable objectsand 50 activatable (but not restartable) objects with the activation system.Then it restarts the activation system. The test wants to ensure that therestartable objects are all restarted, but that *none* of the non-restartableobjects are restarted.

In the first case, having a CountDownLatch that starts at 50 and awaits for themall to be restarted makes perfect sense. The await() call has a timeout to makesure we don't wait excessively long for all 50 to be restarted, but theCountDownLatch allows the test to continue immediately as soon as all 50 havebeen restarted. For this case, it's also not necessary to have a retry loop inwaitAllStarted(). We just wait once until the latch is decremented to zero, oruntil we time out. It might make sense to raise the timeout here, since there'sa lot of activity; 30 or even 60 seconds might be reasonable. In the normal casethe restarts will all occur quickly, so we'll only wait the full timeout ifthere's a failure.

In the second case, we expect zero objects to be restarted, so having aCountDownLatch starting at 50 doesn't make sense. There is no notion of havingthe test proceed immediately as soon as "all" of some events have happened. Onechoice here is to wait around for a while and count how many of thenon-restartable objects actually get restarted. An AtomicInteger would besufficient for that. An alternative might be to have a CountDownLatchinitialized to 1. If any one of the non-restartable objects is restarted, weknow immediately that the test has failed and we can report that. We might notcare if additional non-restartable objects are restarted after we've observedthe first one having been restarted erroneously. (Then again, we might.) But ifothers do, decrementing the CountDownLatch further after it's reached zero doesnothing, so that's OK.

Unfortunately, the successful case needs the test to wait around for a while,because it's trying to verify a negative, i.e., that no non-restartable objectswere actually restarted. I'm not sure whether that's better done with aCountDownLatch(1) plus timeout, or a sleep call followed by a check of anAtomicInteger. I think the latter is a bit easier understand, since the"latching" behavior in this case is the error, not the normal condition.

Finally, these changes are intended to improve performance. It would be good tosee some measurements to ensure that the performance has actually improved. TheCountDownLatch stuff is an improvement over the original array of booleans, andit potentially makes the code cleaner, but I think its potential for performanceimprovement is limited.

On my system the test run takes about 80 seconds. (It actually does all of theabove twice: once after shutting down rmid, and once after causing theactivation group JVMs to exit ("crash").) The howManyRestarted() loop forrestartable objects sleeps for ten seconds before polling the number of objectsrestarted. On average it might wait five seconds too long; given the two testscenarios, that's a savings of ten seconds. The howManyRestarted() loop for theobjects that *aren't* supposed to be restarted waits for 20 secondsunconditionally. With two scenarios run, that's 40 seconds right there, fullyhalf of the time of the test run. (Where does the other half of the time go?)That time is unavoidable waiting around to make sure the wrong thing (restartinga non-restartable object) doesn't happen.

I think some of the CountDownLatch stuff will improve the test code, but let'snot make any assumptions about performance.


s'marks


On 12/18/13 7:37 AM, Tristan Yan wrote:

Hi Everyone

Please help to review the code change for bug JDK-8030057.


http://cr.openjdk.java.net/~tyan/JDK-8030057/webrev.00/
<http://cr.openjdk.java.net/%7Etyan/JDK-8030057/webrev.00/>
Description:

Performance improvement for two RMI tests

java/rmi/activation/Activatable/forceLogSnapshot

method waitAllStarted is using recursive sleep to poll 50 restartedObject to be
true, we can use modern CountDownLatch to implement blocking-time wait.

Also suggest shorten waiting time to 5 seconds.

java/rmi/activation/Activatable/checkAnnotations

We can subclass ByteArrayOutputStream which support notification when data was
written. Also use two thread wait output string and error string to be not null

Thank you
Tristan

Re: RFR for JDK-8030057: speed up forceLogSnapshot and checkAnnotations

Reply via email to