> This is an interesting topic and I didn't wanted to bring it to the scene > until I finally take the time to write down my robustness plan for geoserver, > but yeah, I found the parser raising a parsing thread on each call, which is > a dangerous thing. Like it is in general a good approach for simple > applications but does not scale, you'll end up degrading performance instead > of gaining as thread cound increase (think of multiple geoserver requests or > multiple udig layers). But the threading stuff should be a > performance/scalability concern, quite unrelated to our current memory > problems. > Yet, nothing can be said until a profiler speaks. Well said, but i admit, streaming in this manner does not work so well. > >> A 100 features taking up 606 MB seems out of hand though. Are you using >> the xpath streaming method? > No, the one that receives the QName of the element to parse, being the > feature > name. >> That message is grossly inefficient, but i >> thought the class based and name based streaming methods were better. >> Perhaps not though. >> >> Anyways, i will check out the code you submitted. What do you think >> about tieing up your simple parser to the binding stuff? > Sounds like a good idea, but first we'd need a profiling session to know > what's actually happen. It may not worth the effort. Agreed. > But, seems like a good time to bring something I've been thinking about for a > while, guess already told you something. Let's see what you think: > Ideally, I'd like to decouple the xml-xsd parsing and encoding from the > underlying xml api, like to create our own indirection layer in order to plug > a specific xml parsing/encoding tech. But this indirection layer should be > more close to our problem domain than the low level xml I/O. What I have in > mind, among being able to plug different pull or sax parser implementations, > is to enable other sort of encodings too, say, binary xml. For this it'd be > nice if that indirection layer is more thought as a serializing/scanning > layer than to the textual nature of xml, as to allow reading double[] arrays > directly from a gml:coordList to a packed coordinate sequence and the like. > That's a rough idea, but certainly it'd be nice to avoid so many levels > of "parsing", like right now a coord sequence parsing follows these steps: > xml text -> double[] -> DirectPosition[] -> Coordinate[] (or something like > that) > > I know it may be something for a long term improvement plan, yet, it'd be > nice > to know if you think that's reasonable and have some idea on how difficult it > could be to decouple xml-xsd from (xerces?). > I think this sounds like a pretty good idea. As you said it is no simple task but i think it would be useful. And indeed the encoding part of the api is something i am not in love with... so it could use some work regardless. We should set up a time to chat about this on IRC and throw some ideas around. I should also probably brush on up on bxml stuff. > Cheers, > > Gabriel. >> Gabriel Roldán wrote: >>> Hi Justin, >>> >>> I've been working on getting the nsdi test server data rendering in udig >>> by using the StreamingParser in the WFS DataStore, and got it. >>> >>> Yet, its leading to OOM errors on the first try. I have yet to run it >>> inside a profiler, though in the meantime I did a quick (simple) feature >>> parser based on xml pull to get something done while we tackle the >>> StreamingParser problems. >>> >>> Also, to assess this problem I've created a couple normal java >>> applications in the tests folder so its easy to be run, as I imagine you >>> don't have to bother with setting up a udig trunk development >>> environment. >>> >>> So, in the gt-wfs module, there are two test classes, >>> XmlSimpleFeatureParserTest and StreamingParserFeatureReaderTest, that can >>> be run as normal java applications through their main methods. They >>> exercise the parsing of the same GetFeature response from the nsdi >>> server, and are tests for the two stratagy objects used for that purpose. >>> >>> A sample output of running them as java apps (not unit tests) is bellow, >>> which shows more or less the memory problem. Yet, in the long run I'll >>> need, or rather would prefer, the StreamingParser to function properly, >>> so we don't have yet another parsing technology around. If you have any >>> clue about what can be happening that'd be cool. >>> >>> Sample output from the bogobenchmark apps: >>> XmlPull Feature parser: >>> Fetched 100 features in 111654ms. (avg. 1116ms/feature) Mem. used: 7MB. >>> >>> StreamingParser >>> Fetched 100 features in 112481ms. (avg. 1124ms/feature) Mem. used: >>> 606MB. >>> >>> (the request is setting maxFeatures=100 or it becomes an endless wait, >>> but the OOM happens with enough patience) >>> >>> Gabriel > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Geotools-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/geotools-devel > > !DSPAM:4007,47b34ec4200341137850744! >
-- Justin Deoliveira The Open Planning Project [EMAIL PROTECTED] ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Geotools-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/geotools-devel
