Hi Tristan,

First of all, these are two completely separate changes. They're sort-of related in that they involve replacing sleep() loops and polling with different constructs, but other than that, they're completely independent. As you'll see from my review comments, there are different issues going on with each as well. It might be worthwhile to consider splitting these into separate subtasks, one for each of these changes, so that if one of the changes converges more quickly (seems like) it can go in independently of the others.

CheckAnnotations

This is a pretty complicated test to begin with, and the changes make it even more complicated. It adds two threads, a single Lock, two conditions, and stream subclasses that notify these conditions. It's not clear to me why the low-level constructs from j.u.concurrent.locks are necessary. At first glance the idea of having a notifying specialization of a ByteArrayOutputStream is reasonable. I'd expect the lock and condition to be inside the stream object, though, not external to it. In addition, the condition being notified is simply "something has been written". We don't actually know that all the output from the subprocess has been read. The old version waited around for a while and assumed that it had collected it all. This of course is racy; but having the notifying stream is still racy, if it happens that the output is written in multiple chunks.

Note that if we want to get subprocess output as it arrives, there are two threads hidden inside of RMI's TestLibrary.JavaVM class that collect stdout and stderr from the subprocess. By default they simply write to whatever stream they are given. An alternative approach would be to plumb an interface to these threads and have the output sent directly to the test as it comes in. (I think one of the other RMI tests does this, though it does it in a particularly clumsy way that I wouldn't recommend emulating.)

But stepping back a bit, the original test is over-complicated to begin with, and I'm reluctant to add all these additional mechanisms in this proposed change, because they add more complexity, and also because I can't quite convince myself that they're correct or that they even solve the problem that exists in the original test.

The original test probably needs to be reconceived entirely. The idea is to activate several objects in several different activation groups (which are child JVMs of the RMID process), and to make sure that the output from each is "annotated" with a prefix that indicates which activation group it came from. It seems to me that a much simpler way to approach the problem is to activate several objects and have them each emit a unique bit of output. Then, shut everything down, collect ALL the output, and check to make sure that each object's unique output is on a line that was annotated properly. This completely avoids threads, locks, conditions, timing loops, and race conditions.

If you think this alternative approach would be an effective way to proceed, I'd be happy to help you out with it.

ForceLogSnapshot

This is another fairly complicated test that is partially helped by the addition of CountDownLatches. Basically, it registers 50 activatable+restartable objects and 50 activatable (but not restartable) objects with the activation system. Then it restarts the activation system. The test wants to ensure that the restartable objects are all restarted, but that *none* of the non-restartable objects are restarted.

In the first case, having a CountDownLatch that starts at 50 and awaits for them all to be restarted makes perfect sense. The await() call has a timeout to make sure we don't wait excessively long for all 50 to be restarted, but the CountDownLatch allows the test to continue immediately as soon as all 50 have been restarted. For this case, it's also not necessary to have a retry loop in waitAllStarted(). We just wait once until the latch is decremented to zero, or until we time out. It might make sense to raise the timeout here, since there's a lot of activity; 30 or even 60 seconds might be reasonable. In the normal case the restarts will all occur quickly, so we'll only wait the full timeout if there's a failure.

In the second case, we expect zero objects to be restarted, so having a CountDownLatch starting at 50 doesn't make sense. There is no notion of having the test proceed immediately as soon as "all" of some events have happened. One choice here is to wait around for a while and count how many of the non-restartable objects actually get restarted. An AtomicInteger would be sufficient for that. An alternative might be to have a CountDownLatch initialized to 1. If any one of the non-restartable objects is restarted, we know immediately that the test has failed and we can report that. We might not care if additional non-restartable objects are restarted after we've observed the first one having been restarted erroneously. (Then again, we might.) But if others do, decrementing the CountDownLatch further after it's reached zero does nothing, so that's OK.

Unfortunately, the successful case needs the test to wait around for a while, because it's trying to verify a negative, i.e., that no non-restartable objects were actually restarted. I'm not sure whether that's better done with a CountDownLatch(1) plus timeout, or a sleep call followed by a check of an AtomicInteger. I think the latter is a bit easier understand, since the "latching" behavior in this case is the error, not the normal condition.

Finally, these changes are intended to improve performance. It would be good to see some measurements to ensure that the performance has actually improved. The CountDownLatch stuff is an improvement over the original array of booleans, and it potentially makes the code cleaner, but I think its potential for performance improvement is limited.

On my system the test run takes about 80 seconds. (It actually does all of the above twice: once after shutting down rmid, and once after causing the activation group JVMs to exit ("crash").) The howManyRestarted() loop for restartable objects sleeps for ten seconds before polling the number of objects restarted. On average it might wait five seconds too long; given the two test scenarios, that's a savings of ten seconds. The howManyRestarted() loop for the objects that *aren't* supposed to be restarted waits for 20 seconds unconditionally. With two scenarios run, that's 40 seconds right there, fully half of the time of the test run. (Where does the other half of the time go?) That time is unavoidable waiting around to make sure the wrong thing (restarting a non-restartable object) doesn't happen.

I think some of the CountDownLatch stuff will improve the test code, but let's not make any assumptions about performance.

s'marks


On 12/18/13 7:37 AM, Tristan Yan wrote:
Hi Everyone

Please help to review the code change for bug JDK-8030057.


http://cr.openjdk.java.net/~tyan/JDK-8030057/webrev.00/
<http://cr.openjdk.java.net/%7Etyan/JDK-8030057/webrev.00/>
Description:

Performance improvement for two RMI tests

java/rmi/activation/Activatable/forceLogSnapshot

method waitAllStarted is using recursive sleep to poll 50 restartedObject to be
true, we can use modern CountDownLatch to implement blocking-time wait.

Also suggest shorten waiting time to 5 seconds.

java/rmi/activation/Activatable/checkAnnotations

We can subclass ByteArrayOutputStream which support notification when data was
written. Also use two thread wait output string and error string to be not null

Thank you
Tristan

Reply via email to