Charles, One of the primary reasons (right now) for keeping the DOM tree around is to work around some severe bugs in XmlSchema. The XmlSchema serializer in 1.3.2 loses a bunch of things so the results schemas that you get would not be correct. I think all the bugs have been fixed in XmlSchema and I've been asking for a new release. See: http://mail-archives.apache.org/mod_mbox/ws-commons-dev/200802.mbox/<200802071000.14543.dkulp%40apache.org> but so far, no luck. I'd appreciate it if you could also start bugging them. :-) If we can get a version that can actually round-trip schema properly, I'm OK with dropping the DOM.
That all said, I've also thought about creating a "ShemaManager" to go along with the current WSDLManager to cache a lot of this. Just haven't gotten around to doing it. I'd definitely welcome any patches that would help us head that direction. :-) Dan On Tuesday 12 February 2008, Charles O'Farrell wrote: > G'day all, > > I have been given the task of generating WSDL from my companies large > collection of application models, as well as handling the invoking of > corresponding services which are already deployed. The number of > possible services numbers in the hundreds, with a handful of large > (2MB) shared shemas. > > When trying to run a small Jetty server with more than one of these > generated WSDLs I quickly ran out of memory (the default setting - 64M > I think). While it wouldn't be hard to bump up the memory allocation, > I feared the final scenario of hundreds of WSDLs would be problematic > even for large amounts of memory. > > To cut a long story short this is what I found: > > 1. For each WSDL, every imported schema is loaded into memory, > regardless of whether it is shared among other WSDLs. > 2. Every Schema DOM tree is stored in memory after parsing. > > Given that the Schema is parsed to the more useful XmlSchema object > tree, I'm not sure what benefits are gained from keeping it in DOM. I > fixed the memory bloat by some minor changes in SchemaUtil, which I > will explain briefly here. Note that reflection was unfortunately > required in dealing with the XmlSchema library. > > 1. Used a static map to update the XmlSchemaCollection parameter with > any cached Schemas before calling schemaCol.read(schemaElem, > systemId); in extractSchema > > 2. Nulled out cached DOM elements in the following: > > - extractSchema() -> xmlSchema.setElement() (well actually I > stopped it being set) > - addSchema() -> schema.setElement() after targetNamespace is > retrieved > - At the end of getSchemas() iterate any new schemas, get its > NodeNamespaceContext, call getDeclaredPrefixes() before settings > its node field to null. > > 3. Ignored schemaList from the constructor and instead just relied on > an internal set to avoid recursion. (I think this map is only needed > on the WSDL2Java?) > 4. Fixed WSDLQueryHandler to output full WSDL due to missing schema > node (I loaded it from the file system instead of serialising the > Definition object) > > I guess my biggest qualm in all this is that it was extremely > difficult to subclass and spring SchemaUtil to make the required > changes. In particular I had to reproduce the following invocation > class chain to fix the problem. > > JaxWsServiceFactoryBean -> buildServiceFromWSDL() -> > WSDLServiceFactory -> create() -> WSDLServiceBuilder -> getSchemas() > -> SchemaUtil > > Because SchemaUtil isn't a sprung object, nor any of the other > classes, and because most of the methods/fields are private I ended up > literally copy+pasting each class. > > Forgive me if this all sounds like criticism, because I am very > impressed and happy with CXF. This is just as much a documenting of my > findings as anything else. > > Anyway. I'm not too worried about what happens now but I am curious > what you guys think of all this. > > Cheers, > > Charles O'Farrell -- J. Daniel Kulp Principal Engineer, IONA [EMAIL PROTECTED] http://www.dankulp.com/blog
