I thought it worthwhile to start a new thread with a clear subject line on my efforts to characterize startup performance.
I obtained for us a group license to JProfile, and I proceeded to apply it to several small programs that created client or server endpoints. I could upload JProfile snapshots or HTML call trees (those are really big) to someplace, like my people.apache.org home dir, or even confluence, if someone wants to check up on me, and I could add the little programs that I have sitting in the systest subproject to svn. One caveat: I measured in our development environment., not against a snapshot. The most conspicuous result is that the classloader is operating against a medium-long list of directories of classes instead of the same sized, or even a much shorter, list of JAR files. This amplifies some effects. The most conspicuous result is that Spring+Xerces sure eats up a lot of time. For either a client or a server endpoint, the cost of creating the default Bus via Spring is much larger than anything else we do. More time goes into Xerces than into Spring itself. This leads me to contemplate a STaX bean loader class, which it should be possible to bolt onto Spring through their published API. Spring+Xerces is the only answer to the question, 'why does it take so long to initialize *a single* endpoint application?' Then I moved on to tests that loop creating an endpoint without incurring any additional Spring+Xerces overhead. The next thing that turned up, not entirely surprisingly, was the JAXB RI. Not much we can do about that, though I discovered that an optimization that I added last night had the effect of lowering the cost of JAXB startup. I confess that I did not explain why except to observe that JAXB did less classloading when I used a cache to speed up JAX-WS's location of wrapper classes. Since all the tests pass, I'm trusting that I didn't change any semantics. Looking for time not (obviously) connected to JAXB, my next sad report concerns XmlSchema. Consider a simple JAX-WS+JAXB client endpoint with a WSDL. It reads and parses the WSDL, and builds the service model. Building the service model did not appear as a significant time sink. Parsing the WSDL file did ... because XML schema's namespace resolution mechanism turns out to be a significant hotspot. I haven't looked in detail at their code since it's cumbersome to set up for the purpose from Sweden. And I think that we all know that that we have some frustration in seeing changes move through the XmlSchema process to a release. Caching the parsed form of a WSDL seems a plausible thing to do, except that I'm personally quite aware that the process of building the service model includes filling gaps that turn up in the schema. Also, in theory, the WSDL could *change* from one Endpoint creation to the next, could it not? We could make a conscious decision to ignore that possibility. Of course, XmlSchema itself has no cloning concept. We could move the 'gap-filling' code from the service factory where it lives now to the WSDL schema-getter so that it happens once and for all, and then treat the schema as read-only. Finally, let me mention the one thing that I did something about. In the process of assembling an endpoint, the code makes repeated calls to getResponseWrapper and getRequestWrapper for JAXWS. Each of these calls in turn does some expensive reflection and class-loading. I introduced a little cache in the JaxWsServiceConfiguration, and got the time back. This, of course, means that the simple case is not a trifle slower to the tune of creating two HashMaps and making a few probes. However, since even a single endpoint creation ends up calling these more than once per operation, it's at worst, as far as I can tell, a wash. In a loop creating 100 Endpoints, it's a worthwhile improvement. I think it would be a good thing if some others would join the discussion here about what, if anything, to do next. At one level, we could focus on making performance tests part of the standard (or an optional) build, to ensure that we don't code big performance regressions. Or we could aim at some of the issues discussed above. Or I could make more measurements.
