Hi all,

A while back I implemented some hacks on the xsd/xml encoder to
improve GML encoding performance. I finally got around to benchmarking. 
   Here are the results. What I actually did is described afterward.

Here they are:

Test 1: 100,000 multi polygons
------------------------------

The polygons are fairly big with lots of points. Basically the 
topp:states layer duplicated ~ 2000 times.

First step was the baseline, using FeatureTransformer:

* GML2 Transformer: 540 M, 4.4 M/s, 124 s

- First number being the total amount of data encoded.
- Second being the average encoding rate.
- Third number being the total encoding time.

Next step was using the encoder as is, no optimizations:

* GML2 Normal: 528 M, 2.4 M/s, 255 s

Hmmm... twice as slow.

And finally with the optimizations:

* GML2 Optimized: 528 M, 4.3 M/s, 126 s

Much better, Still a bit slower but not by much.

The last test I did was GML 3 with the optimizations, and similar results:

* GML3 Optimized: 518 M, 4.2 M/s, 12 6s

Test 2: 500,000 line strings
----------------------------

The second test was encoding 500,000 line strings from tiger, so not 
many coordinates, just two point line strings. And the numbers:

* GML2 Transformer: 466M, 8.5 M/s, 56s
* GML2 Normal: 365 M, 1.1 M/s, 345s
* GML2 Optimized: 391M, 6.2 M/s, 64s
* GML3 Optimized: 379M, 5.4 M/s, 72s

Yikes, the non-optimized encoder is almost 7 times as slow. The 
optimized encoder is still slower, but again not by much.

So all in all good results with the optimizations. The two encoders are 
now comparable for GML. I also ran the optimizations through the wfs 
cite tests to ensure that with the optimizations the GML being produced 
is still "correct".

What I did
----------

* A custom FeatureEncoderDelegate for feature collections

A while back I came up with an interface, EncoderDelegate. The original 
purpose of this interface was allow other XML encoders to be embedded in 
the encoder. When the main encoding routine encounters one of these 
objects, it fully delegates all encoding to it, rather than continue on 
with the stack based schema assisted encoding.

So my idea for optimization was to make one of these implementations for 
FeatureCollections. This would totally remove the walking up and down 
the encoding stack that the encoder does for each feature that is encoded.

The problem is that that walking up and down the stack is what looks up 
the bindings based on type, using the correct binding to encode 
attributes, etc... So what I did was basically simulate this inside the 
encoder delegate. IT grabs the feature type, and figures out what 
bindings would be used to encode each attribute, rolls it into a list. 
Then for each feature looks up the binding directly and encodes.

* A custom EncoderDelegate for geometries

The above gave quite a speed up, but not exactly what i was hoping for. 
Initial benchmarks still came back about twice as slow. A bit of 
profiling pointed to the geometry encoding bindings. The above strategy 
of rolling the bindings into a list only works for simple content, 
geometries still go through the main encoding routine.

So the next step was to break out EncoderDelegate's for geometries as 
well, and have them used directly. And it helped. After this numbers 
were closer, with the optimized encoder coming back just a bit slower.

* Respecting number of decimals

Analyzing the above results I noticed that the optimized xsd encoder was 
delivering substantially more data than the transformer. Which puzzled 
me since based on my optimizations it should actually be producing less. 
After analyzing data from both, the answer was clear, the number of 
decimals being encoded.

GML from the xsd encoder was not respecting a limited number of decimals 
at all. Which resulted quite a bit more data encoding than is necessary.

Adding the cutting off of decimals gave the amount of data coming back 
much less, and the total time increase. Giving the final results being 
quite close in the polygon case (lots of coordinates).

Things to note
--------------

* This only works for simple feature data (sorry ben)
* These speeds are only for GML, not for general encoding
* The optimizations are engaged via explicitly setting a property, so if 
you don't ask for them you won't get them

I have a bit of clean up to do with the patches but I plan to commit soon.

-Justin

-- 
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.


------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to