Re: Schema DOM memory problem

Daniel Kulp Tue, 12 Feb 2008 14:21:25 -0800

Charles,

One of the primary reasons (right now) for keeping the DOM tree around is 
to work around some severe bugs in XmlSchema.   The XmlSchema serializer 
in 1.3.2 loses a bunch of things so the results schemas that you get 
would not be correct.    I think all the bugs have been fixed in 
XmlSchema and I've been asking for a new release.  See:
http://mail-archives.apache.org/mod_mbox/ws-commons-dev/200802.mbox/<200802071000.14543.dkulp%40apache.org>
but so far, no luck.   I'd appreciate it if you could also start bugging 
them.   :-)   If we can get a version that can actually round-trip 
schema properly, I'm OK with dropping the DOM.


That all said, I've also thought about creating a "ShemaManager" to go 
along with the current WSDLManager to cache a lot of this.    Just 
haven't gotten around to doing it.   I'd definitely welcome any patches 
that would help us head that direction.   :-)

Dan





On Tuesday 12 February 2008, Charles O'Farrell wrote:
> G'day all,
>
> I have been given the task of generating WSDL from my companies large
> collection of application models, as well as handling the invoking of
> corresponding services which are already deployed. The number of
> possible services numbers in the hundreds, with a handful of large
> (2MB) shared shemas.
>
> When trying to run a small Jetty server with more than one of these
> generated WSDLs I quickly ran out of memory (the default setting - 64M
> I think). While it wouldn't be hard to bump up the memory allocation,
> I feared the final scenario of hundreds of WSDLs would be problematic
> even for large amounts of memory.
>
> To cut a long story short this is what I found:
>
> 1. For each WSDL, every imported schema is loaded into memory,
> regardless of whether it is shared among other WSDLs.
> 2. Every Schema DOM tree is stored in memory after parsing.
>
> Given that the Schema is parsed to the more useful XmlSchema object
> tree, I'm not sure what benefits are gained from keeping it in DOM. I
> fixed the memory bloat by some minor changes in SchemaUtil, which I
> will explain briefly here. Note that reflection was unfortunately
> required in dealing with the XmlSchema library.
>
> 1. Used a static map to update the XmlSchemaCollection parameter with
> any cached Schemas before calling schemaCol.read(schemaElem,
> systemId); in extractSchema
>
> 2. Nulled out cached DOM elements in the following:
>
>    - extractSchema() -> xmlSchema.setElement() (well actually I
> stopped it being set)
>    - addSchema() -> schema.setElement() after targetNamespace is
>    retrieved
>    - At the end of getSchemas() iterate any new schemas, get its
>    NodeNamespaceContext, call getDeclaredPrefixes() before settings
> its node field to null.
>
> 3. Ignored schemaList from the constructor and instead just relied on
> an internal set to avoid recursion. (I think this map is only needed
> on the WSDL2Java?)
> 4. Fixed WSDLQueryHandler to output full WSDL due to missing schema
> node (I loaded it from the file system instead of serialising the
> Definition object)
>
> I guess my biggest qualm in all this is that it was extremely
> difficult to subclass and spring SchemaUtil to make the required
> changes. In particular I had to reproduce the following invocation
> class chain to fix the problem.
>
> JaxWsServiceFactoryBean -> buildServiceFromWSDL() ->
> WSDLServiceFactory -> create() -> WSDLServiceBuilder -> getSchemas()
> -> SchemaUtil
>
> Because SchemaUtil isn't a sprung object, nor any of the other
> classes, and because most of the methods/fields are private I ended up
> literally copy+pasting each class.
>
> Forgive me if this all sounds like criticism, because I am very
> impressed and happy with CXF. This is just as much a documenting of my
> findings as anything else.
>
> Anyway. I'm not too worried about what happens now but I am curious
> what you guys think of all this.
>
> Cheers,
>
> Charles O'Farrell



-- 
J. Daniel Kulp
Principal Engineer, IONA
[EMAIL PROTECTED]
http://www.dankulp.com/blog

Re: Schema DOM memory problem

Reply via email to