Hi all, A while back I implemented some hacks on the xsd/xml encoder to improve GML encoding performance. I finally got around to benchmarking. Here are the results. What I actually did is described afterward.
Here they are: Test 1: 100,000 multi polygons ------------------------------ The polygons are fairly big with lots of points. Basically the topp:states layer duplicated ~ 2000 times. First step was the baseline, using FeatureTransformer: * GML2 Transformer: 540 M, 4.4 M/s, 124 s - First number being the total amount of data encoded. - Second being the average encoding rate. - Third number being the total encoding time. Next step was using the encoder as is, no optimizations: * GML2 Normal: 528 M, 2.4 M/s, 255 s Hmmm... twice as slow. And finally with the optimizations: * GML2 Optimized: 528 M, 4.3 M/s, 126 s Much better, Still a bit slower but not by much. The last test I did was GML 3 with the optimizations, and similar results: * GML3 Optimized: 518 M, 4.2 M/s, 12 6s Test 2: 500,000 line strings ---------------------------- The second test was encoding 500,000 line strings from tiger, so not many coordinates, just two point line strings. And the numbers: * GML2 Transformer: 466M, 8.5 M/s, 56s * GML2 Normal: 365 M, 1.1 M/s, 345s * GML2 Optimized: 391M, 6.2 M/s, 64s * GML3 Optimized: 379M, 5.4 M/s, 72s Yikes, the non-optimized encoder is almost 7 times as slow. The optimized encoder is still slower, but again not by much. So all in all good results with the optimizations. The two encoders are now comparable for GML. I also ran the optimizations through the wfs cite tests to ensure that with the optimizations the GML being produced is still "correct". What I did ---------- * A custom FeatureEncoderDelegate for feature collections A while back I came up with an interface, EncoderDelegate. The original purpose of this interface was allow other XML encoders to be embedded in the encoder. When the main encoding routine encounters one of these objects, it fully delegates all encoding to it, rather than continue on with the stack based schema assisted encoding. So my idea for optimization was to make one of these implementations for FeatureCollections. This would totally remove the walking up and down the encoding stack that the encoder does for each feature that is encoded. The problem is that that walking up and down the stack is what looks up the bindings based on type, using the correct binding to encode attributes, etc... So what I did was basically simulate this inside the encoder delegate. IT grabs the feature type, and figures out what bindings would be used to encode each attribute, rolls it into a list. Then for each feature looks up the binding directly and encodes. * A custom EncoderDelegate for geometries The above gave quite a speed up, but not exactly what i was hoping for. Initial benchmarks still came back about twice as slow. A bit of profiling pointed to the geometry encoding bindings. The above strategy of rolling the bindings into a list only works for simple content, geometries still go through the main encoding routine. So the next step was to break out EncoderDelegate's for geometries as well, and have them used directly. And it helped. After this numbers were closer, with the optimized encoder coming back just a bit slower. * Respecting number of decimals Analyzing the above results I noticed that the optimized xsd encoder was delivering substantially more data than the transformer. Which puzzled me since based on my optimizations it should actually be producing less. After analyzing data from both, the answer was clear, the number of decimals being encoded. GML from the xsd encoder was not respecting a limited number of decimals at all. Which resulted quite a bit more data encoding than is necessary. Adding the cutting off of decimals gave the amount of data coming back much less, and the total time increase. Giving the final results being quite close in the polygon case (lots of coordinates). Things to note -------------- * This only works for simple feature data (sorry ben) * These speeds are only for GML, not for general encoding * The optimizations are engaged via explicitly setting a property, so if you don't ask for them you won't get them I have a bit of clean up to do with the patches but I plan to commit soon. -Justin -- Justin Deoliveira OpenGeo - http://opengeo.org Enterprise support for open source geospatial. ------------------------------------------------------------------------------ Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp as they present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com _______________________________________________ Geotools-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/geotools-devel
