Shapely

pascal leroux Thu, 10 Jul 2008 07:53:37 -0700

2008/7/9 Sean Gillies <[EMAIL PROTECTED]>:

> pascal leroux wrote:
> > Hi Sean
> >
> >
> >
> > 2008/7/8 Sean Gillies <[EMAIL PROTECTED]>:
> >
> >> Hi Pascal,
> >>
> >> A Python solution shouldn't be 100 times slower than C, especially since
> >> both OGR and Shapely use the same GEOS library to compute relationships
> >> between geometries. Is it possible that your other solution uses
> >> "intersects" instead of "not disjoint" (could be faster), or is more
> >> approximate?
> >>
> >
> >
> > No, my C code uses GEOSDisjoint.
> > And, when the function returns false/0, I call GEOSIntersection and
> write,
> > in WKT format, the result (to display and check the calculated
> intersections
> > vs my input shapefile). So the C program does more things than the Python
> > version.
> >
> >
> >
> >> The Python script above also uses a great deal of memory since it never
> >> frees memory allocated by OGR's GetNextFeature(), and then copies the
> >> feature to a Shapely geom (and a GEOS geom). If your computer's free
> >> memory is low, performance can be poor. Is this a possibility?
> >>
> >
> > I have no idea. Do you mean that the "process" that frees memory (when
> the
> > reference counter of an object equals 0) could be slow and explain the
> low
> > speed ? Or a more general problem (operating system level) ?
> >
> > I added an explicit "del objet" statement in my script after the
> > list.append() but the results are the same.
> >
> >
> >> I recommend that you first spatially index your data for this kind of
> >> application -- maybe a quadtree (.qix) for your shapefile, or use an
> >> Rtree alongside your list of Shapely geoms. You can thereby efficiently
> >> select features that approximately intersect with your feature of
> >> interest (using the bounding box of "ref"), and then test only that
> >> smaller set for exact intersection. Often the cost of indexing is much
> >> smaller than the cost of performing unnecessary exact intersections.
> >>
> >
> > Sure ! It is not a smart way to process such big shapefiles. Even if I
> code
> > it in C, I'll not look for intersections without using spatial indexes.
> The
> > purpose of my script was only to test Python/Shapely speed (interpreted
> vs
> > compiled, a lot of software layers, ...) with a big (but real) shapefile.
> >
> > I'll make other tests using the OGRLayer::SetSpatialFilter[Rect] method
> and
> > I don't give up using Python and Shapely (not yet !)
> >
> >
> >> Using xrange() instead of range() when your lists are large is a good
> >> idea, but I don't think that's the issue here.
> >>
> >> Cheers,
> >> Sean
> >>
> >>
> >
> > cheers
> >
> > Pascal
> >
> > ps: my script ran till the end : about 270 minutes
> >
>
> Considering that you're carrying out 2.8 billion disjoint operations
> (with 75,000 features), that's almost 174,000 per second. If you use an
> index, the entire process might take only a few seconds.
>
> Would you be willing to share your C code? I'd like to look into the
> performance difference more closely.
>
> Sean
>



No problem : in attached file, I put a tar file with my code (see README
file for details).

Another thing that could be interesting for you : I used another shapefile.
It contains less objects (3100) that correspond to elevation contour lines
on the same area. The objects envelopes intersect each other very often (in
my first shapefile (buildings and houses), they didn't and, searching
envelopes intersections is the first thing that is done in the "disjoint"
Geometry method).

With the second shapefile, my Python script ran in 4 minutes, my C program
in 3 minutes 40 seconds (IF I don't calculate the exact intersections ...
otherwise, 14 minutes)

Hope this helps.

Pascal

intergeos.tgz
Description: GNU Zip compressed data

_______________________________________________
Community mailing list
[email protected]
http://lists.gispython.org/mailman/listinfo/community

Re: [Community] Run time comparisons between C/GEOS and Python/Shapely

Reply via email to