Re: Data sets
The shapefile size for the large resolutions was a bit ridiculous; I
generated shape files for the query and target geometries up to 360k
points.   It's ~28MB compressed, but ~11GB uncompressed.
http://cactus.bennight.com/shapefiles.zip

I have a more manageable set at
http://cactus.bennight.com/shapefiles_small.zip.  This includes the single
query geometry up to 360k points (~5mb), and the 1000 feature collections
up to ~ 2800 points (~44mb uncompressed).

I also went ahead and rendered a graphic of what they look like

here's the query geometry:
http://cactus.bennight.com/query-feature.png

here's the target geometries:
http://cactus.bennight.com/target-feature.png

and here's both superimposed
http://cactus.bennight.com/query-and-target-feature.png

I'll update and add them to the repo after the tests finish running.


Re: Synthetic vs. natural
Completely reasonable.  Probably using variable simplified coastlines would
be a good real world case analogue (with a randomly generated query
geometry - asking the question "what countries does this query window
overlap).


Re: highly data-dependent
That makes sense - what I"m hopefully trying to come up with are a set of
heuristics that I can use to determine which strategy to use (with the
caveat that I may have to collect stats/profile a data set, and it's not
going to necessarily be correct for every comparison - just hoping to
answer the "best default setting")


Re: terminology (intersects/intersection)
Fixing that now - just semantic sloppiness on my part.


Re: Increasing vertex count by densifying
I think taken in reverse this might be a good analogue for expected
simplification gains?  (Granted the is also a decreasing amount of
irregularity in a real simplification, but unless intersects optimizes for
co-linear points I expect the cost is the same).


Re: profiling
I'll run this down - later on this week and post the results back.


On Sun, Jun 14, 2015 at 6:42 PM, Martin Davis <[email protected]> wrote:

> And a couple more thoughts:
>
> - The performance is highly data dependent, so it would be nice to see an
> image of the test datasets to aid in understanding.  If this was provided
> it would be easy to see that the query geometry is a regular polygon (I
> think?).  As mentioned, this doesn't really take advantage of monotone
> chains.
> - It's  pretty hard to synthesize data that approximates real-world data.
> You might want to try and work with real datasets.
> - I think relative peformance is going to be highly data-dependent.  For
> example, a highly convex query geometry against a lot of small target
> geometries might show PG is much faster than Raw.
>
> On Sun, Jun 14, 2015 at 3:11 PM, Joe Amenta <[email protected]> wrote:
>
>> Martin's "intersects != intersection" point is actually extremely
>> relevant here... the latest version on GitHub does an apples-to-oranges
>> comparison.
>>
>> The initial version before the "fixed methodology" commit compared
>> PreparedGeometry.intersects(Geometry) with Geometry.*intersects*(Geometry),
>> whereas the latest version compares PreparedGeometry.intersects(Geometry)
>> with Geometry.*intersection*(Geometry).
>>
>> Just wanted to point this out...
>>
>> On Sun, Jun 14, 2015 at 5:51 PM, Martin Davis <[email protected]> wrote:
>>
>>> Interesting and very thorough analysis!  It's definitely a surprise that
>>> PreparedGeometry drops below raw Geometry for large query sizes.  It's
>>> maybe a bit less of a surprise that small PGs don't provide much
>>> improvement.
>>>
>>> It's going to take me a while to fully digest this, but a few initial
>>> comments:
>>>
>>> - In JTS the intersects and intersection operation are quite different
>>> (very much so in semantics, and currently somewhat less so in
>>> implementation).  PreparedGeometry currently provides only the intersects
>>> operation.  Support for intersection might be forthcoming at some point,
>>> but it's not there now.  You might want to adjust your terminology to
>>> reflect this, to avoid confusion.
>>>
>>> - Increasing vertex count by densifying lines *might* be skewing the
>>> results somewhat, since it might defeat the use of monotone chains to
>>> improve indexing.  Not sure how easy this is to mitigate - it's hard to
>>> generate large random polygons!  Nevertheless, real-world datasets might
>>> show different performance characteristics (hopefully better!).
>>>
>>> - To figure out what's going on it will probably be necessary to profile
>>> the execution for large PGs. Typically in geometric algorithms performance
>>> problems are due to an O(n^2) operation lurking somewhere in the code.
>>> These sometimes don't show up until very large input sizes.
>>>
>>> Thanks for the post - if this provides better guidance on using PG vs
>>> Raw, this will be very helpful.
>>>
>>>
>>>
>>> On Sun, Jun 14, 2015 at 1:38 PM, Chris Bennight <[email protected]>
>>> wrote:
>>>
>>>>
>>>> I put together a quick project to try and understand some of the
>>>> trade-offs for prepared vs. non-prepared geometries (when to use a prepared
>>>> geometry, when not to, etc.)
>>>>
>>>> The readme has a summary of the tests and results, with graphics
>>>> https://github.com/chrisbennight/intersection-test
>>>>
>>>> Probably the best summary is this chart:
>>>>
>>>> https://raw.githubusercontent.com/chrisbennight/intersection-test/master/src/main/resources/difference%20-%20prepared%20vs%20non%20prepared%20-%20chart.png
>>>>
>>>> which shows the difference of  (prepared geometry intersection time -
>>>> regular geometry intersection time) as a function of the number of points
>>>> in each geometry.
>>>>
>>>> My big takeaway is that once the target geometry starts getting around
>>>> 10k or more points, using a regular geometry intersection may be faster
>>>> than using a prepared geometry intersection.   (Not saying 10k is a magic
>>>> number, just using that as a rough estimate of when I might need to start
>>>> caring)
>>>>
>>>> I think a typically use case of a query window (think typically WMS,
>>>> WFS, etc. BBOX query) with 5 points matching against complex geometries in
>>>> the database might very easily see worse performance by creating a new
>>>> PreparedGeometry instance of that 5 point geometry.
>>>>
>>>> Which is completely fair - just trying to understand the trade-offs and
>>>> heuristics for using it.  Other strategies - such as in the case above,
>>>> preparing the target geometry and flipping the intersection test (since in
>>>> testing preparing the geometry was orders of magnitude faster than the
>>>> intersection) are also things I'm interested in.
>>>>
>>>> Probably my big request is a little review from the experts - have I
>>>> missed something obvious, done something horribly wrong, or is this a
>>>> reasonable outcome?  I'd also be curious if there are any guidelines for
>>>> how/when people use a prepared geometry.  It looks like PostGIS might
>>>> employ some heuristics in what and when to prepared the geometry?
>>>>
>>>> Anyway, just looking for a quick vector check as well as any
>>>> thoughts/guidance.  Overall I'd like to come up with a heuristic for when
>>>> to used prepared geometry against arbitrary sets of polygons.
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Jts-topo-suite-user mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/jts-topo-suite-user
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Jts-topo-suite-user mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/jts-topo-suite-user
>>>
>>>
>>
>
------------------------------------------------------------------------------
_______________________________________________
Jts-topo-suite-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jts-topo-suite-user

Reply via email to