Hi Martin, Larry,
Thanks for the precision,
Implementing #3 I was wondering why not implementing the following
short circuit in Geometry equals method
public boolean
equals(Object o) {
if (this == o) return true;
etc.
}
Can you see any drawback ?
Seems that this way, many algorithms could automatically benefit
your #3 trick.
About the #2, I think the test currently done in OpenJUMP is wrong
In case where several shell's ebvelope contain hole's envelope, if
shells are
included in each other, the hole is associated to the smallest :
if (shellEnvelope.contains(holeEnvelope)
&&
(minShellEnv == null || minShellEnv.contains(shellEnvelope))
&&
(cga.isPointInRing(testPt,coordList ))) {
minShellEnv = shellEnvelope;
}
I think it will not work in this case :

I'll measure the penalty if the second test is removed
and eventually, I'll try to implement Martin's #2 tip.
Michaël
Le 21/04/2012 06:03, Martin Davis a écrit :
The file I sent is in Ohio, I think. It was sent to me by Stephen
Mather, who I met in Washington DC at FOSS4GNA last week. It's
pretty excessive - not the normal kind of shapefile, I think. But
still, it's nice to optimize the corner cases, especially if they
take 10x longer to run.
I did realize that the use case for GeoTools (in GeoServer at
least) is a bit different to OJ. GeoServer doesn't cache data in
memory, so it's going to be reading data many times, so every
second is precious. And there's no expectation it will fix bad
data - it just has to provide something that mirrors what was
read. OJ on the other hand might be expected to try to fix some
of the bad data, and users might be willing to spend more time on
load to get that functionality.
Anyway, fix #3 is good in any case, and it seems like a good thing
that you added fix #1 too. Fix #2 was a bit more speculative, and
it only applies in very specific circumstances, so it's probably
not as critical. (BTW, the idea was not to assume to which shell
holes were assigned to, but to check if only one shell could
actually contain a given hole, by virtue of the envelope
intersection test).
On 4/20/2012 11:02 AM, Michaël Michaud wrote:
Martin,
Thanks for the file, very impressive (did not guess where it was
- but only searched in Canada).
You convinced me.
We had two previous discussions on the list about preserving
weird shapefile as much as possible,
but none of them was about holes lying out of their shell.
So let's go where geotools already is.
I had a quick look on your suggestion #2, but could not see a
way to improve.
For more than one shell case, we don't know which hole is
associated with which shell before the test
(except if we assume that the description of a shell is
immediately followed by the description of its holes)
Don't know if we should assume this.
For "more than one shell" case, I just changed the order of
tests to do a bit more envelope tests and a bit less PiP tests
but I could not measure the difference on your file.
Michaël
Very thorough investigation, Michael.
The shapefile I was using is here:
http://maps.cmparks.net/geoserver/metroparks/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=metroparks:tpi_1&outputFormat=SHAPE-ZIP
( http://tinyurl.com/c9k26gp )
It is pretty gnarly - over 1M polygons, and has the following
stats for the largest (vertices/holes):
18527 1112
58597 5526
31087 1598
24152 1318
17989 1386
20280 1098
20815 1336
44871 1204
61496 3024
28370 1120
46058 2156
21970 1546
91379 3034
121028 8012
81172 4544
28544 1272
20524 1074
55556 1992
2282758 169278
1166731 94382
241350 14250
142028 5922
139035 5690
33610 1344
83112 3134
38464 1048
GeoTools has had optimization #1 for quite a while, I think,
if that's any reassurance.
On 4/20/2012 6:29 AM, Michaël Michaud wrote:
Hi,
Here are some test cases done with a 200 Mb shapefile
(thanks Jukka)
containing : 221770 polygons
worst case : 287273 pts and 5484 holes
Dataset : Current OJ code / 1st Martin's optimization / 3rd
Martin's optimization / 1st+3rd
whole shapefile : 41 s / 24 s / 24 s / 11 s
worst polygon : 4.6 s / 0.16 s / 3.3 s / 0.13 s
==>
1st optimization has a big impact on worst case and a medium
impact on a whole file with mixed cases
3rd optimization has a small impact on worst case and a
medium impact on a whole file with mixed cases
I'll commit the 3rd optimization anyway has it has no side
effect.
For the 1st one, i'll try to check if it may have negative
impact on some shapefile (if I can find or produce such
shapefiles)
------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
No virus
found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1913 / Virus Database: 2411/4948 - Release
Date: 04/20/12
------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
|