Re: [Geotools-devel] StreamingParser outofmemory errors

Justin Deoliveira Wed, 13 Feb 2008 17:35:44 -0800

> This is an interesting topic and I didn't wanted to bring it to the scene 
> until I finally take the time to write down my robustness plan for geoserver, 
> but yeah, I found the parser raising a parsing thread on each call, which is 
> a dangerous thing. Like it is in general a good approach for simple 
> applications but does not scale, you'll end up degrading performance instead 
> of gaining as thread cound increase (think of multiple geoserver requests or 
> multiple udig layers). But the threading stuff should be a 
> performance/scalability concern, quite unrelated to our current memory 
> problems.
> Yet, nothing can be said until a profiler speaks.
Well said, but i admit, streaming in this manner does not work so well.
> 
>> A 100 features taking up 606 MB seems out of hand though. Are you using
>> the xpath streaming method? 
> No, the one that receives the QName of the element to parse, being the 
> feature 
> name.
>> That message is grossly inefficient, but i 
>> thought the class based and name based streaming methods were better.
>> Perhaps not though.
>>
>> Anyways, i will check out the code you submitted. What do you think
>> about tieing up your simple parser to the binding stuff?
> Sounds like a good idea, but first we'd need a profiling session to know 
> what's actually happen. It may not worth the effort.
Agreed.
> But, seems like a good time to bring something I've been thinking about for a 
> while, guess already told you something. Let's see what you think:
> Ideally, I'd like to decouple the xml-xsd parsing and encoding from the 
> underlying xml api, like to create our own indirection layer in order to plug 
> a specific xml parsing/encoding tech. But this indirection layer should be 
> more close to our problem domain than the low level xml I/O. What I have in 
> mind, among being able to plug different pull or sax parser implementations, 
> is to enable other sort of encodings too, say, binary xml. For this it'd be 
> nice if that indirection layer is more thought as a serializing/scanning 
> layer than to the textual nature of xml, as to allow reading double[] arrays 
> directly from a gml:coordList to a packed coordinate sequence and the like.
> That's a rough idea, but certainly it'd be nice to avoid so many levels 
> of "parsing", like right now a coord sequence parsing follows these steps:
> xml text -> double[] -> DirectPosition[] -> Coordinate[] (or something like 
> that)
> 
> I know it may be something for a long term improvement plan, yet, it'd be 
> nice 
> to know if you think that's reasonable and have some idea on how difficult it 
> could be to decouple xml-xsd from (xerces?).
> 
I think this sounds like a pretty good idea. As you said it is no simple 
task but i think it would be useful. And indeed the encoding part of the 
api is something i am not in love with... so it could use some work 
regardless. We should set up a time to chat about this on IRC and throw 
some ideas around. I should also probably brush on up on bxml stuff.
> Cheers,
> 
> Gabriel.
>> Gabriel Roldán wrote:
>>> Hi Justin,
>>>
>>> I've been working on getting the nsdi test server data rendering in udig
>>> by using the StreamingParser in the WFS DataStore, and got it.
>>>
>>> Yet, its leading to OOM errors on the first try. I have yet to run it
>>> inside a profiler, though in the meantime I did a quick (simple) feature
>>> parser based on xml pull to get something done while we tackle the
>>> StreamingParser problems.
>>>
>>> Also, to assess this problem I've created a couple normal java
>>> applications in the tests folder so its easy to be run, as I imagine you
>>> don't have to bother with setting up a udig trunk development
>>> environment.
>>>
>>> So, in the gt-wfs module, there are two test classes,
>>> XmlSimpleFeatureParserTest and StreamingParserFeatureReaderTest, that can
>>> be run as normal java applications through their main methods. They
>>> exercise the parsing of the same GetFeature response from the nsdi
>>> server, and are tests for the two stratagy objects used for that purpose.
>>>
>>> A sample output of running them as java apps (not unit tests) is bellow,
>>> which shows more or less the memory problem. Yet, in the long run I'll
>>> need, or rather would prefer, the StreamingParser to function properly,
>>> so we don't have yet another parsing technology around. If you have any
>>> clue about what can be happening that'd be cool.
>>>
>>> Sample output from the bogobenchmark apps:
>>> XmlPull Feature parser:
>>> Fetched 100 features  in 111654ms. (avg. 1116ms/feature) Mem. used: 7MB.
>>>
>>> StreamingParser
>>> Fetched 100 features  in 112481ms. (avg. 1124ms/feature) Mem. used:
>>> 606MB.
>>>
>>> (the request is setting maxFeatures=100 or it becomes an endless wait,
>>> but the OOM happens with enough patience)
>>>
>>> Gabriel
> 
> 
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Geotools-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/geotools-devel
> 
> !DSPAM:4007,47b34ec4200341137850744!
>



-- 
Justin Deoliveira
The Open Planning Project
[EMAIL PROTECTED]

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Re: [Geotools-devel] StreamingParser outofmemory errors

Reply via email to