Justin,

I have been able to solve the problem with the GeoServerImpl instances accumulating. I think this was a global problem, so that is good news. The bad new however, was that it was only a minor improvement and it didn't solve the biggest memory leak which is specific to app-schema. However, working on this made me understand what is going on there.

The biggest issue is in all the schema's that app-schema builds, custom schema's but also the GSML. Because these also import the static XSD's like GML, XS, OGC, etc... backward links from these keep them alive. App-schema keeps building new GSML and other schema's without the old ones ever being disposed of, for each datastore. The substitution groups in GML keeps on growing and growing, containing many multiple references to the same elements, but from different instances of the same schema.

ReferencingDirectiveLeakPreventer and SubstitutionGroupLeakPreventer don't seem to be doing the trick for me. I didn't figure out why, because I concluded that I couldn't really use them. Schema's should be removed from memory as soon as they are not needed anymore, not until someone figures out that there are duplicates (it really is too late then). I wrote a method that properly removes all references to an existing schema from all other schema's. This seems to do the trick for the WFS schema's and stops GeoServerImpl and all the rest being kept alive in memory.

The consequences for App-schema are big though. In the current setup, you specify your schema's for each datastore (for each mapping). So even in one single test, multiple versions of the GSML schema are alive. All of them accumulate to multiple references in the substitution groups of GML for the same element. It just does not make any possible sense to link multiple versions of the GSML schema to one single GML schema in memory! Because of the backwards links and substitution groups it becomes one big clutter.

My solution is to keep a registry that maps schema locations to schema's and reuse schema's that have already been built. Performance and memory improvement at the same time.

In that case, the only way there could still be cluttering in the XSD schema's in memory, if for some reason someone is using the same schema in different file locations, or different versions of the same schema, in different datastores of the same instance of app-schema. I don't see a way to avoid that.

Regards
Niels

On 22/03/11 22:33, Justin Deoliveira wrote:
Yup, I have put much time into figuring out and squashing these issues. Comments inline.

On Tue, Mar 22, 2011 at 4:02 AM, Niels <[email protected]> wrote:

    I have been trying to figure out a memory leak in app-schema (and
    I think, perhaps even in geoserver in general).

    The problem is that the app-schema unit tests run out of memory
    when they are ran in a batch by maven, never when they are run by
    themselves. While maven is running the tests, data is accumulated
    on the heap and at some point it will run out and crash. This is a
    serious bug.

    I have figured out quite a bit about it, using Java Memory Analyzer.
    The data that is accumulating is mainly XSD schema information
    (XSDElementDeclarationImpl etc...).
    But no features or types or kept alive.

    The other thing I figured out is that the GeoServerImpl , Catalog,
    ResourcePool, etc.. objects are not disposed of. For every test
    that has been run, these objects are kept on the heap! Although,
    the abstract test class does get rid of them. My first assumption
    was that these objects where somehow also keeping XSD information
    in memory, but I was wrong: it is the other way around.

     I have included a screenshot in the attachment that shows the
path that keeps the GeoserverImpl alive, through the XSD classes. I have also added a second screenshot of another GeoserverImpl
    instance's path ( in the same memory dump). If you look at the
    addresses you can also see where these paths are the same and
    where they split up.
    XSD schema's can be imported in to each other, and that is what is
    going on here.

    Here is a summary of my findings
    1. Almost all XSD classes are singletons, therefore static and
    kept alive.

Done intentionally. They have to be cached since they are so expensive to create.

    2. org.geoserver.wfs.xml.v1_1_0.WFS is an exception this rule, it
    is initiated and contains, indirectly, a link to the running
    geoserverimpl.

Yes this is an issue, one i very much want to kill. I did some work to fix this but app-schema soon because a blocker. It is kind of a separate issue but long story short when we build up a schema object we need to iterate over every type since app-schema types have dependencies among them. If we can figure out how to process those dependencies rather than just build a schema from all types we get the side affect of getting rid of the WFSConfiguration singleton... which is not a source of much pain.

    3. For every instance of geoserverimpl, a new
    org.geoserver.wfs.xml.v1_1_0.WFS is created and *imported* in all
    the other (static) XSD classes (OGC, GML, etc).

It should just be GML, but yeah, part of the work is to fix this as well.. reversing the importing dependency so as not to modify the gml schema.

    4. These imports accumulate - For each test a new import is added.

Take a look at GML.buildSchema and you should see two adapters which attempt to manage this. ReferencingDirectiveLeakPreventer which removes duplicate import statements so they do not accumulate. And SubstitutionGroupLeakPreventer which prevents the same problem but with the gml _Feature substitution group.


    But what I cannot find, is where and how this
    org.geoserver.wfs.xml.v1_1_0.WFS is imported in to the other XSD
    classes. Is there anyone who can point me in the right direction?

    Also, even if I do find the place where this happens - the
    question remains what to do.
    1. Just try to "undo" the import when closing down geoserver? In
    that case I definitely need to figure out how the import is
    happening in the first place.
    2. Just clear out all of the static XSD classes, and rebuilt them
    each time.


Definitely (1) or be prepared to wait hours for your tests to finish :) Again if we can figure out the app-schema issue this will get a lot better. Again the issue being that when we build an XSDSchema object for a complex feature type, we need some way to traverse the feature type dependency graph.. and pull in any dependencies into the XSDSchema object rather than just build a schema with them all. Also note that this is an issue that severely limits GeoServer to containing many layers.

    Regards

-- *Niels Charlier*

    Software Engineer
    CSIRO Earth Science and Resource Engineering
    Phone: +61 8 6436 8914

    Australian Resources Research Centre
    26 Dick Perry Avenue, Kensington WA 6151

    
------------------------------------------------------------------------------
    Enable your software for Intel(R) Active Management Technology to
    meet the
    growing manageability and security demands of your customers.
    Businesses
    are taking advantage of Intel(R) vPro (TM) technology - will your
    software
    be a part of the solution? Download the Intel(R) Manageability Checker
    today! http://p.sf.net/sfu/intel-dev2devmar
    _______________________________________________
    Geoserver-devel mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/geoserver-devel




--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.



--
*Niels Charlier*

Software Engineer
CSIRO Earth Science and Resource Engineering
Phone: +61 8 6436 8914

Australian Resources Research Centre
26 Dick Perry Avenue, Kensington WA 6151
------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to