pascal leroux wrote: > Hi Sean > > > > 2008/7/8 Sean Gillies <[EMAIL PROTECTED]>: > >> Hi Pascal, >> >> A Python solution shouldn't be 100 times slower than C, especially since >> both OGR and Shapely use the same GEOS library to compute relationships >> between geometries. Is it possible that your other solution uses >> "intersects" instead of "not disjoint" (could be faster), or is more >> approximate? >> > > > No, my C code uses GEOSDisjoint. > And, when the function returns false/0, I call GEOSIntersection and write, > in WKT format, the result (to display and check the calculated intersections > vs my input shapefile). So the C program does more things than the Python > version. > > > >> The Python script above also uses a great deal of memory since it never >> frees memory allocated by OGR's GetNextFeature(), and then copies the >> feature to a Shapely geom (and a GEOS geom). If your computer's free >> memory is low, performance can be poor. Is this a possibility? >> > > I have no idea. Do you mean that the "process" that frees memory (when the > reference counter of an object equals 0) could be slow and explain the low > speed ? Or a more general problem (operating system level) ? > > I added an explicit "del objet" statement in my script after the > list.append() but the results are the same. > > >> I recommend that you first spatially index your data for this kind of >> application -- maybe a quadtree (.qix) for your shapefile, or use an >> Rtree alongside your list of Shapely geoms. You can thereby efficiently >> select features that approximately intersect with your feature of >> interest (using the bounding box of "ref"), and then test only that >> smaller set for exact intersection. Often the cost of indexing is much >> smaller than the cost of performing unnecessary exact intersections. >> > > Sure ! It is not a smart way to process such big shapefiles. Even if I code > it in C, I'll not look for intersections without using spatial indexes. The > purpose of my script was only to test Python/Shapely speed (interpreted vs > compiled, a lot of software layers, ...) with a big (but real) shapefile. > > I'll make other tests using the OGRLayer::SetSpatialFilter[Rect] method and > I don't give up using Python and Shapely (not yet !) > > >> Using xrange() instead of range() when your lists are large is a good >> idea, but I don't think that's the issue here. >> >> Cheers, >> Sean >> >> > > cheers > > Pascal > > ps: my script ran till the end : about 270 minutes >
Considering that you're carrying out 2.8 billion disjoint operations (with 75,000 features), that's almost 174,000 per second. If you use an index, the entire process might take only a few seconds. Would you be willing to share your C code? I'd like to look into the performance difference more closely. Sean _______________________________________________ Community mailing list [email protected] http://lists.gispython.org/mailman/listinfo/community
