Wow, that's quite a difference.  I wonder what they're doing differently?

It might be that C reading of binary data is much faster than Java (because of the wierd endian issues in Shapefiles, there's a lot of computation needed just to read the file).

And I wonder if ogrinfo is actually assembling the geometries, or if it's just reading headers or counting geometries?

On 4/21/2012 5:26 AM, Rahkonen Jukka wrote:
Hi,

I asked how GDAL/OGR performs from the gdal-dev list. Discussion follows.

From: Rahkonen Jukka<Jukka.Rahkonen<at>  mmmtike.fi>
Subject: Re: Performance of reading large polygons with 
holes<http://news.gmane.org/find-root.php?message_id=%3c84446DEF76453C439E9E97E438E13A6314B135%40suutari.haapa.mmm.fi%3e>
Newsgroups: 
gmane.comp.gis.gdal.devel<http://news.gmane.org/gmane.comp.gis.gdal.devel>
Date: 2012-04-21 12:24:06 GMT ( ago)

Even Rouault wrote:

Le samedi 21 avril 2012 12:41:49, Jukka Rahkonen a écrit :
Hi,

Martin Davis started this thread on OpenJUMP-dev list
http://thread.gmane.org/gmane.comp.gis.jump.devel/11119
Perhaps someone would like to test how OGR performs with such data?
Link to dataset is mentioned in the discussion.
I suppose that dataset you mention is :
http://maps.cmparks.net/geoserver/metroparks/ows?service=WFS&version=1.0.0&request=GetFeature&;>typeName=metroparks:tpi_1&outputFormat=SHAPE-
ZIP ?

Exactly.

If so, on my box, the whole shapefile (~ 390 MB for the .shp / ~1 million
features) can be read in about 4 seconds with a simplified version of ogrinfo
that just loops on all the features. The algorithm that assembles the rings of
polygon shapes into polygons that conform to Simple Features has been pretty
improved during the last years. It is in the
OGRGeometryFactory::organizePolygons() method ( see
http://trac.osgeo.org/gdal/browser/trunk/gdal/ogr/ogrgeometryfactory.cpp#L1015
) . For shapefiles, there's a shortcut that save a lot of computations w.r.t.
the general algorithm, because the shapefile specification states that outer
rings should be described clockwise , whereas inner rings should be described
counter-clockwise.
Thus it is 4 seconds vs. 32 seconds measured by Martin. It is a considerable 
difference but perhaps Martin is
not doing exactly the same thing.  Anyway, speed of OGR seems to be excellent.

-Jukka Rahkonen-


________________________________
Lähettäjä: Michaël Michaud [michael.mich...@free.fr]
Lähetetty: 21. huhtikuuta 2012 12:19
Vastaanottaja: jump-pilot-devel@lists.sourceforge.net
Aihe: Re: [JPP-Devel] Performance issue with large polygons with holes in 
ShapefileReader

Hi,

Sorry for the bad example :
In the illustration, the medium shell does not pass the PiP test,
so it is never considered as a possible candidate.

After many useless tries to find a case where this test did not work,
I tried to understand why it works.

Finally, that's how I explain it :
if one consider the geometry as valid  :
If a hole h is included in 2 shells s1 and s2,
==>  s1 and s2 are overlaping in h
==>  as the multigeometry is valid ==>  s1 (resp s2) shoul be entirely in a 
hole of s2 (resp s1)
==>  s1 envelope (resp s2 envelope) is included in s2 envelope (resp s1 
envelope)
==>  h is in a polygon which in turn is in the hole of another polygon,
       it must be associated to the inner one (the one with the smallest 
envelope)

Sorry for the noise, especially if the result was obvious for you, guys,
it was certainly not for me ;-)

Michaël

About the #2, I think the test currently done in OpenJUMP is wrong
In case where several shell's ebvelope contain hole's envelope, if shells are
included in each other, the hole is associated to the smallest :
if (shellEnvelope.contains(holeEnvelope)&&
     (minShellEnv == null || minShellEnv.contains(shellEnvelope))&&
     (cga.isPointInRing(testPt,coordList ))) {
     minShellEnv = shellEnvelope;
}

I think it will not work in this case :

[cid:part1.01040602.00090309@free.fr]

I'll measure the penalty if the second test is removed
and eventually, I'll try to implement Martin's #2 tip.

Michaël






Le 21/04/2012 06:03, Martin Davis a écrit :
The file I sent is in Ohio, I think.  It was sent to me by Stephen Mather, who 
I met in Washington DC at FOSS4GNA last week.  It's pretty excessive - not the 
normal kind of shapefile, I think.  But still, it's nice to optimize the corner 
cases, especially if they take 10x longer to run.

I did realize that the use case for GeoTools (in GeoServer at least) is a bit 
different to OJ.  GeoServer doesn't cache data in memory, so it's going to be 
reading data many times, so every second is precious.  And there's no 
expectation it will fix bad data - it just has to provide something that 
mirrors what was read.  OJ on the other hand might be expected to try to fix 
some of the bad data, and users might be willing to spend more time on load to 
get that functionality.

Anyway, fix #3 is good in any case, and it seems like a good thing that you 
added fix #1 too.  Fix #2 was a bit more speculative, and it only applies in 
very specific circumstances, so it's probably not as critical.  (BTW, the idea 
was not to assume to which shell holes were assigned to, but to check if only 
one shell could actually contain a given hole, by virtue of the envelope 
intersection test).

On 4/20/2012 11:02 AM, Michaël Michaud wrote:
Martin,

Thanks for the file, very impressive (did not guess where it was - but only 
searched in Canada).

You convinced me.
We had two previous discussions on the list about preserving weird shapefile as 
much as possible,
but none of them was about holes lying out of their shell.

So let's go where geotools already is.
I had a quick look on your suggestion #2, but could not see a way to improve.
For more than one shell case, we don't know which hole is associated with which 
shell before the test
(except if we assume that the description of a shell is immediately followed by 
the description of its holes)
Don't know if we should assume this.

For "more than one shell" case, I just changed the order of tests to do a bit 
more envelope tests and a bit less PiP tests but I could not measure the difference on 
your file.

Michaël


Very thorough investigation, Michael.

The shapefile I was using is here:


http://maps.cmparks.net/geoserver/metroparks/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=metroparks:tpi_1&outputFormat=SHAPE-ZIP

         (       http://tinyurl.com/c9k26gp      )

It is pretty gnarly - over 1M polygons, and has the following stats for the 
largest (vertices/holes):

18527 1112
58597 5526
31087 1598
24152 1318
17989 1386
20280 1098
20815 1336
44871 1204
61496 3024
28370 1120
46058 2156
21970 1546
91379 3034
121028 8012
81172 4544
28544 1272
20524 1074
55556 1992
2282758 169278
1166731 94382
241350 14250
142028 5922
139035 5690
33610 1344
83112 3134
38464 1048

GeoTools has had optimization #1 for quite a while, I think, if that's any 
reassurance.

On 4/20/2012 6:29 AM, Michaël Michaud wrote:
Hi,

Here are some test cases done with a 200 Mb shapefile (thanks Jukka)
containing : 221770 polygons
worst case : 287273 pts and 5484 holes

Dataset : Current OJ code / 1st Martin's optimization / 3rd Martin's 
optimization / 1st+3rd
whole shapefile : 41 s / 24 s / 24 s / 11 s
worst polygon : 4.6 s / 0.16 s / 3.3 s / 0.13 s

==>
1st optimization has a big impact on worst case and a medium impact on a whole 
file with mixed cases
3rd optimization has a small impact on worst case and a medium impact on a 
whole file with mixed cases

I'll commit the 3rd optimization anyway has it has no side effect.
For the 1st one, i'll try to check if it may have negative impact on some 
shapefile (if I can find or produce such shapefiles)





------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2



_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net<mailto:Jump-pilot-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel





------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2



_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net<mailto:Jump-pilot-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel




No virus found in this message.
Checked by AVG - www.avg.com<http://www.avg.com>
Version: 2012.0.1913 / Virus Database: 2411/4948 - Release Date: 04/20/12



------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2



_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net<mailto:Jump-pilot-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel





------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2



_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net<mailto:Jump-pilot-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel





------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2


_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1913 / Virus Database: 2411/4950 - Release Date: 04/21/12
------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel

Reply via email to