Re: [Geotools-devel] StreamingParser outofmemory errors

Gabriel Roldán Wed, 13 Feb 2008 12:10:40 -0800

Hi Justin,

thanks for the prompt reply.


On Wednesday 13 February 2008 07:59:06 pm Justin Deoliveira wrote:
> Wow, those results are compelling. Indeed I thought before of basing the
> streaming parser on a pull parser... seemed more natural then the hack
> of creating a separate parsing thread.
This is an interesting topic and I didn't wanted to bring it to the scene 
until I finally take the time to write down my robustness plan for geoserver, 
but yeah, I found the parser raising a parsing thread on each call, which is 
a dangerous thing. Like it is in general a good approach for simple 
applications but does not scale, you'll end up degrading performance instead 
of gaining as thread cound increase (think of multiple geoserver requests or 
multiple udig layers). But the threading stuff should be a 
performance/scalability concern, quite unrelated to our current memory 
problems.
Yet, nothing can be said until a profiler speaks.

>
> A 100 features taking up 606 MB seems out of hand though. Are you using
> the xpath streaming method? 
No, the one that receives the QName of the element to parse, being the feature 
name.
> That message is grossly inefficient, but i 
> thought the class based and name based streaming methods were better.
> Perhaps not though.
>
> Anyways, i will check out the code you submitted. What do you think
> about tieing up your simple parser to the binding stuff?
Sounds like a good idea, but first we'd need a profiling session to know 
what's actually happen. It may not worth the effort.
But, seems like a good time to bring something I've been thinking about for a 
while, guess already told you something. Let's see what you think:
Ideally, I'd like to decouple the xml-xsd parsing and encoding from the 
underlying xml api, like to create our own indirection layer in order to plug 
a specific xml parsing/encoding tech. But this indirection layer should be 
more close to our problem domain than the low level xml I/O. What I have in 
mind, among being able to plug different pull or sax parser implementations, 
is to enable other sort of encodings too, say, binary xml. For this it'd be 
nice if that indirection layer is more thought as a serializing/scanning 
layer than to the textual nature of xml, as to allow reading double[] arrays 
directly from a gml:coordList to a packed coordinate sequence and the like.
That's a rough idea, but certainly it'd be nice to avoid so many levels 
of "parsing", like right now a coord sequence parsing follows these steps:
xml text -> double[] -> DirectPosition[] -> Coordinate[] (or something like 
that)

I know it may be something for a long term improvement plan, yet, it'd be nice 
to know if you think that's reasonable and have some idea on how difficult it 
could be to decouple xml-xsd from (xerces?).

Cheers,

Gabriel.
>
> Gabriel Roldán wrote:
> > Hi Justin,
> >
> > I've been working on getting the nsdi test server data rendering in udig
> > by using the StreamingParser in the WFS DataStore, and got it.
> >
> > Yet, its leading to OOM errors on the first try. I have yet to run it
> > inside a profiler, though in the meantime I did a quick (simple) feature
> > parser based on xml pull to get something done while we tackle the
> > StreamingParser problems.
> >
> > Also, to assess this problem I've created a couple normal java
> > applications in the tests folder so its easy to be run, as I imagine you
> > don't have to bother with setting up a udig trunk development
> > environment.
> >
> > So, in the gt-wfs module, there are two test classes,
> > XmlSimpleFeatureParserTest and StreamingParserFeatureReaderTest, that can
> > be run as normal java applications through their main methods. They
> > exercise the parsing of the same GetFeature response from the nsdi
> > server, and are tests for the two stratagy objects used for that purpose.
> >
> > A sample output of running them as java apps (not unit tests) is bellow,
> > which shows more or less the memory problem. Yet, in the long run I'll
> > need, or rather would prefer, the StreamingParser to function properly,
> > so we don't have yet another parsing technology around. If you have any
> > clue about what can be happening that'd be cool.
> >
> > Sample output from the bogobenchmark apps:
> > XmlPull Feature parser:
> > Fetched 100 features  in 111654ms. (avg. 1116ms/feature) Mem. used: 7MB.
> >
> > StreamingParser
> > Fetched 100 features  in 112481ms. (avg. 1124ms/feature) Mem. used:
> > 606MB.
> >
> > (the request is setting maxFeatures=100 or it becomes an endless wait,
> > but the OOM happens with enough patience)
> >
> > Gabriel



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Re: [Geotools-devel] StreamingParser outofmemory errors

Reply via email to