Shapely

Sean Gillies Wed, 09 Jul 2008 11:33:53 -0700

pascal leroux wrote:
> Hi Sean
> 
> 
> 
> 2008/7/8 Sean Gillies <[EMAIL PROTECTED]>:
> 
>> Hi Pascal,
>>
>> A Python solution shouldn't be 100 times slower than C, especially since
>> both OGR and Shapely use the same GEOS library to compute relationships
>> between geometries. Is it possible that your other solution uses
>> "intersects" instead of "not disjoint" (could be faster), or is more
>> approximate?
>>
> 
> 
> No, my C code uses GEOSDisjoint.
> And, when the function returns false/0, I call GEOSIntersection and write,
> in WKT format, the result (to display and check the calculated intersections
> vs my input shapefile). So the C program does more things than the Python
> version.
> 
> 
> 
>> The Python script above also uses a great deal of memory since it never
>> frees memory allocated by OGR's GetNextFeature(), and then copies the
>> feature to a Shapely geom (and a GEOS geom). If your computer's free
>> memory is low, performance can be poor. Is this a possibility?
>>
> 
> I have no idea. Do you mean that the "process" that frees memory (when the
> reference counter of an object equals 0) could be slow and explain the low
> speed ? Or a more general problem (operating system level) ?
> 
> I added an explicit "del objet" statement in my script after the
> list.append() but the results are the same.
> 
> 
>> I recommend that you first spatially index your data for this kind of
>> application -- maybe a quadtree (.qix) for your shapefile, or use an
>> Rtree alongside your list of Shapely geoms. You can thereby efficiently
>> select features that approximately intersect with your feature of
>> interest (using the bounding box of "ref"), and then test only that
>> smaller set for exact intersection. Often the cost of indexing is much
>> smaller than the cost of performing unnecessary exact intersections.
>>
> 
> Sure ! It is not a smart way to process such big shapefiles. Even if I code
> it in C, I'll not look for intersections without using spatial indexes. The
> purpose of my script was only to test Python/Shapely speed (interpreted vs
> compiled, a lot of software layers, ...) with a big (but real) shapefile.
> 
> I'll make other tests using the OGRLayer::SetSpatialFilter[Rect] method and
> I don't give up using Python and Shapely (not yet !)
> 
> 
>> Using xrange() instead of range() when your lists are large is a good
>> idea, but I don't think that's the issue here.
>>
>> Cheers,
>> Sean
>>
>>
> 
> cheers
> 
> Pascal
> 
> ps: my script ran till the end : about 270 minutes
>


Considering that you're carrying out 2.8 billion disjoint operations
(with 75,000 features), that's almost 174,000 per second. If you use an
index, the entire process might take only a few seconds.

Would you be willing to share your C code? I'd like to look into the
performance difference more closely.

Sean

_______________________________________________
Community mailing list
[email protected]
http://lists.gispython.org/mailman/listinfo/community

Re: [Community] Run time comparisons between C/GEOS and Python/Shapely

Reply via email to