pascal leroux wrote:
> Hi all
>
> I have to develop an application that will build full topology from
> shapefiles as input data.
> I'm used to coding in C but I've begun using Python for a couple of weeks.
> I wanted to know if Python could work fast enough when the input data are
> rather big.
>
> I made a simple test to find intersections between polygons, with a big
> shapefile as input data (75,000 polygons, buildings and houses, located on a
> 25 x 25km2 area)
>
> Both programs have been run on the same iMac (Mac OS X, Leopard).
>
> On one hand, an executable, written in C, that calls directly the libgeos_c
> library. It ended after 4 minutes : 2100 intersections (points, linestrings
> and polygons).
>
> On the other hand, few Python lines (see below) that use ogr api to read
> shapefile and Shapely to find intersections. I stopped it after 70 minutes,
> at iteration 8100 (20 % done) : about 6 hours estimated for the whole
> shapefile ! I expected smaller differences.
>
> What do you think about these huge differences ? Is it mad to use such big
> files with Python/Shapely ?
> Or did I make a beginner's mistake in my code ? Is it a problem with the
> MacOS version of Shapely ? of Python (2.5.1) ?
>
> Thanks
>
> Pascal
>
> #=======================================
> import os,sys,osgeo.ogr
> from shapely.wkb import loads
> from shapely.geometry import Polygon
>
> source = osgeo.ogr.Open(sys.argv[1])
> layer = source.GetLayer()
> objet = layer.GetNextFeature()
> liste = []
> while objet:
> liste.append(loads(objet.GetGeometryRef().ExportToWkb()))
> objet = layer.GetNextFeature()
> print len(liste)
>
>
> for i in range(len(liste)-1):
> ref = liste[i]
> if i % 100 == 0:
> print str(i)
> os.system("date")
> for j in range(i+1,len(liste)):
> sec = liste[j]
> if ref.disjoint(sec) == False:
> print "intersection entre " + str(i) + ' et ' + str(j)
>
>
Hi Pascal,
A Python solution shouldn't be 100 times slower than C, especially since
both OGR and Shapely use the same GEOS library to compute relationships
between geometries. Is it possible that your other solution uses
"intersects" instead of "not disjoint" (could be faster), or is more
approximate?
The Python script above also uses a great deal of memory since it never
frees memory allocated by OGR's GetNextFeature(), and then copies the
feature to a Shapely geom (and a GEOS geom). If your computer's free
memory is low, performance can be poor. Is this a possibility?
I recommend that you first spatially index your data for this kind of
application -- maybe a quadtree (.qix) for your shapefile, or use an
Rtree alongside your list of Shapely geoms. You can thereby efficiently
select features that approximately intersect with your feature of
interest (using the bounding box of "ref"), and then test only that
smaller set for exact intersection. Often the cost of indexing is much
smaller than the cost of performing unnecessary exact intersections.
Using xrange() instead of range() when your lists are large is a good
idea, but I don't think that's the issue here.
Cheers,
Sean
_______________________________________________
Community mailing list
[email protected]
http://lists.gispython.org/mailman/listinfo/community