At 07:28 AM 3/12/02 -0500, Jacques Paris wrote:
>In comparing 2 regions, one must consider two kinds of data related to shape
>and to position. <snip--rest of message below>
This problem is encountered in other GISes, too (e.g.,
ArcView). Everything Paris says is correct, useful, and insightful (as
always!), but the problem is still not that difficult. One can glean a lot
of information even from polygon centers, areas, and perimeters.
The idea is to spend most of your time computing easy-to-calculate
properties. If a polygon's MBR is the fastest thing to obtain, go for
it. Otherwise look at the MBR center, the polygon area, the centroid, even
the vertex count or the coordinates of the first few vertices--it depends
on how the GIS implements its polygons. Use these as an initial,
potentially redundant, key to check for matches. After all, Mr. Paris'
objections notwithstanding, if two polygons have two different centroids,
then the polygons are different, and you can move on. If you are in a
situation where you don't expect very many duplicates, and you rarely have
polygons that overlap, then using centroids can be an efficient discriminator.
In such a fashion you can winnow down the number of direct
polygon-to-polygon comparisons you have to make. Ultimately, you can try
subtracting one polygon from another and computing the area--it's a good
method--but when the underlying coordinates are represented by INTs (as in
MapInfo), you will run into the problem that quite extensive polygon
slivers can have areas that are computed as zero. (This seems to crop up
even using floating-point coordinates, too, probably because there are bugs
in almost every computational geometry program extant.) If, however, the
polygons are well and truly identical in every respect, and the subtraction
algorithm is a robust and correct one, then the resulting shape will be
null and have a null MBR, so you can test for that.
In summary, here's the algorithm for finding duplicates in any collection C
of objects (such as polygons) that are expensive to compare individually
but for which inexpensive keys can be computed:
1. Create an associative array A that will assign collections of
objects to keys.
2. For each P in C:
2a. Compute a "cheap" key K based on P.
2b. Add P to A[K].
End
3. For each key K in A[]:
3a. Until A[K] is empty Do:
3b. Let P be any element in A[K]
3c. Compare P to the remaining elements of A[K], if
any, using an exact method, and remove every object in A[K] that does not
match P.
3d. Output P.
End
End
Notes
-------
If memory is severely limited compared to the number of objects to process,
make two passes through the dataset. In the first, simply count the number
of occurrences of each key that shows up. Then retain in the associative
array only those keys for which the count is two or greater. The second
pass proceeds much like the algorithm above, except (a) polygons with no
matching key in A[] are output directly and (b) at step 2b, the expensive
comparisons 3a-3d are performed immediately, to ensure that duplicates are
never put into A. In a post-processing step, all objects collected in A in
the second pass are output.
If you don't have associative arrays, then your keys will have to be whole
numbers (or maybe you can press your database into service: table summaries
accomplish similar things; or even, implement a home-grown associative
array with a hash table or something like it--it's easy to do). There are
lots of ways to convert things like centroids, etc., into whole numbers:
use your imagination or research "hashing" in your favorite algorithms
textbook.
If your polygon subtraction algorithm is unreliable, avoid using it in step
3c by first comparing everything else you can think of: area, perimeter,
centroid, number of vertices, coordinates of MBR, etc. If all else fails,
hash the sequence of vertex coordinates into the largest integer you can
handle and play the odds: they're highly in your favor.
Cheers,
Bill Huber
www.quantdec.com
>Two regions may have identical shapes and not be
>coincidental in space. The shape elements are many: number of parts, number
>of nodes, area, perimeter. Positional elements are few in nature,
>essentially x,y coordinates of corresponding nodes.
>
>Notice I do not include the centroid because it is not a reliable element
>having a unique "value" for a region; it is not computed but assigned and
>can be changed within limits. Even if one would "reset" all centroids before
>hand (converttopline + converttoregion), there is no guarantee that the
>assigned centroid would be different if there are some slight differences in
>an outlying polygon (a polygon not containing the centroid), because it
>could be constrained by the limits of the "central" polygon.
>
>It would be much easier if there could be only one criterion that would do
>the trick. When I explored that question two years ago, I tried to use a
>simple all encompassing formula to compare two regions oa and ob and I found
>that area(erase(oa,ob),"sq mi")=0 could be a good solution. However the
>comparison to 0 was not efficient because the mathematics of the operator
>had a tendency then to "leave some accidental crumbs". I am now thinking
>that the solution could be area(erase(oa,ob),"sq mi") =
>area(erase(ob,oa),"sq mi") but isn't that just hoping that the "crumbs" will
>be different?
>
>Jacques Paris
_______________________________________________________________________
List hosting provided by Directions Magazine | www.directionsmag.com |
To unsubscribe, send e-mail to [EMAIL PROTECTED] and
put "unsubscribe MapInfo-L" in the message body.