Hi all,

Warning this is a long email, and parts of it quite get involved in the 
implementation details of the GeoServer wfs. But nonetheless here I go.

Recently I have been putting effort into improving WFS performance. 
Unfortunately the changes I have been making are quite substantial and 
won't be suitable for a stable branch. If you are curious about checking 
it out I have a branch called "appschema_cache" in my geoserver git 
repository:

http://github.com/jdeolive/geoserver/

As the branch name hints the work started out as coming up with a way to 
simply cache application schemas as they are built. My original plan to 
use a cache to work around the slow memory leak for describe feature 
type requests.

http://jira.codehaus.org/browse/GEOS-3534

To back up for a moment. When I refer to "application schemas" I do not 
refer to appschema/complex features. What I mean is the schema that is 
built from the geoserver feature types in the catalog. Such schemas are 
built when:

* responding a DescribeFeatureType request
* encoding a GML3 output (since it uses the encoder which is "schema 
assisted")

Now the approach we take to building the schemas is the following. We 
take the core wfs schema and them modify it. Adding new types and 
element declarations for all the feature types in the GeoServer catlaog. 
Now you can imagine this is inefficient for a number of reasons:

a) It modifies the wfs schema (which is big) every time

b) It scans the entire catalog every time (which is expensive and 
hinders security)

c) It is seriously non thread safe

(c) requires a bit more of an explanation. the hacking of the wfs schema 
to add types and elements goes on in the WFSConfiguration class which by 
design is not meant to be thread safe. However to get around rebuilding 
the schema for every single request the configuration object is cached 
as a singleton. But again modifies internal state upon a request in a 
non thread safe way.

So, that said how do we go about fixing. Well the approach is to instead 
of modifying the wfs schema, leave it be and simply import it. Doing 
this has the benefit of allowing the wfs schema to be built once, and 
cached for its life time.

And if you think about it this makes sence when talking about 
application schemas. When developing an application schema you do so for 
your target namespace and import wfs. You don't copy WFS and modify it 
adding your types.

That is the first part. The second part is to only build a schema object 
for the feature types that are being requested. This alleviates the 
problem of having to scan the entire catalog when responding to a request.

The third part is to move the building of the schema (from GeoServer 
feature types) out of a Configuration class, and into an XSD class. This 
fixes the concurrency problem since the XSD gets build once, and the 
Configuration is instantiated multiple times which is the way the parser 
and encoder are designed to work.

So all that said I have implemented the above improvements and have 
indeed seen improvements. In both speed and the elimination of the slow 
memory leak. I have just started doing official benchmarking so I will 
have some comparative numbers of current trunk vs the appschema_cache 
branch soon.

Also of note is that this work lays all the ground work to finally bring 
home the optimized gml encoder (which I experimented with about a year 
ago) home cleanly. Which means gml3 output and gml2 schema assisted 
output for simple features will perform close to as good (within 5%) as 
the old transformer based gml2 encoder.

So all that is great right. Well there is a problem. And it comes up 
with the appschema (as in complex features) extension. As far as I can 
tell the feature chaining functionality implemented in appschema relies 
on the fact that every time a schema is built for encoding it includes 
every feature type in the GeoServer catalog. I am looking into ways to 
fix this but will have to recruit the help of Ben and the experts. And 
since this email is long enough I will do so in a different thread :)

Thanks for reading if you did in fact get this far :)

-Justin

-- 
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
Geoserver-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Reply via email to