Hi Pascal, thanks for tracking this down. Would you be willing to download r1225 from the Shapely 1.0 branch? I've got a fix for the various binary op/topology descriptors that keeps the reference counts down to 2.
http://svn.gispython.org/svn/Shapely/branches/1.0 On Apr 5, 2009, at 7:22 AM, Pascal Leroux wrote: > Hi Sean, > > I think I've found what's going wrong with the references. > > (the line numbers below correspond to the official release 1.0.11) > > I added in shapely/geometry/base.py, after line 266 (in __del__ > method of the BaseGeometry) > print "__del__ %x" % id(self) > to see when a geometry instance is actually freed > > > >>> from sys import getrefcount as grc > >>> from shapely.geometry import Point > >>> p=Point((0,0)) > >>> grc(p) > 2 > > Python documentation says that's ok to find +1 for the result that is > evaluated inside the function/method. So reference count is 1 > > >>> "%x" % id(p) > 'b7e72e8c' > >>> del p > __del__ b7e72e8c > > As soon as the reference count equals 0, __del__ is called > > >>> p=Point((0,0)) > >>> "%x" % id(p) > 'b7e72e8c' > >>> p.intersection > <shapely.topology.BinaryTopologicalOp object at 0xb7d34cac> > >>> grc(p) > 3 > >>> id(p)-id(p.intersection.context) > 0L > > In the "__get__" method of class BinaryTopologicalOp (line 25, > topology.c), > self.context that was None, now equals my point p > There's one more reference and the last statement above shows that > the new > reference is p.intersection.context > > >>> res = p.intersection(Point((9,9))) > __del__ 94e428c > >>> grc(p) > 4 > >>> id(p)-id(p.intersection.context) > 0L > > A new "call" to "__get__" decrements AND re-increments : reference > count doesn't change > > >>> id(p)-id(res.__p__) > 0L > >>> del p.intersection.context > >>> grc(p) > 3 > > When the function geom_factory is called, a new reference to the > point is done > "ob.__p__ = parent" (base.py, line 45) with "parent" that is my > initial "p" > > >>> del res.__p__ > >>> grc(p) > 2 > >>> del p > __del__ b7e72e8c > >>> > > When I explictely call "del res.__p__", the "p" reference count is > OK (2-1) > __del__ is now called after the last "del p" > > > I think that this problem occurs for all the geometries (they are > all (?) derived from BaseGeometry). > And that explains the issue I posted last week and those of yesterday. > > Do you think it's correct ? > > Pascal > > > > > 2009/4/4 Pascal Leroux <[email protected]> > Hi Sean, > > 2009/4/4 Sean Gillies <[email protected]> > > Hi Pascal, > > This cries out for a cascaded union, I think. Maybe we should have a > 1.0.12 release. Still, the leaks trouble me. Statements like > > P = P.union(p) > > have caused me problems in the past. Probably a matter of held > references? Any interest in digging into it? > > Of course I'm interested in finding what's going wrong. But I've > using Python (and Shapely) for less than a year (my first mail to > the mailing list was posted in last July) and I think my knowledge > in Python is not big enough, especially for Python/C interface, to > fix the issue. > > But I'm trying to get into your code and learn how Shapely works. I > would be so sad to have to work again with C/libgeos. Once again, > Shapely saves me time (thanks !) and I've taught my colleagues > (Institut Géographique National, France) to use Python and Shapely . > > If I find something, I tell you. > > Pascal > > > > > Sean > > On Apr 3, 2009, at 3:02 AM, Pascal Leroux wrote: > > > Even with very simple geometries : > > > > >>> P = Polygon(((0,0),(0,9),(9,9),(9,0))) > > >>> p = Polygon(((1,1),(1,5),(5,5),(5,1))) > > >>> count = 1 > > >>> while 1: > > ... if count % 10000 == 0: print count > > ... P = P.union(p) > > ... count += 1 > > > > > > after 25 minutes : > > > > > > 7020000 > > 7030000 > > Python(4969) malloc: *** mmap(size=2097152) failed (error code=12) > > *** error: can't allocate region > > *** set a breakpoint in malloc_error_break to debug > > ... > > Python(4969) malloc: *** mmap(size=2097152) failed (error code=12) > > *** error: can't allocate region > > *** set a breakpoint in malloc_error_break to debug > > MemoryError > > Python(4969) malloc: *** mmap(size=2097152) failed (error code=12) > > *** error: can't allocate region > > *** set a breakpoint in malloc_error_break to debug > > Traceback (most recent call last): > > File "<stdin>", line 3, in <module> > > File "/System/Library/Frameworks/Python.framework/Versions/2.5/ > lib/ > > python2.5/site-packages/Shapely-1.0.11-py2.5.egg/shapely/ > > topology.py", line 34, in __call__ > > if not self.context.is_valid: > > MemoryError > > >>> > > > > > > This code doesn't make sense but it highlights the issue I talked > > about last week. > > I used a very big polygon in my last email but, here, I cannot use > > simpler ones. > > > > My final purpose is to build a full topology with several shapefiles > > located on a 1°x1° area and I often work with more than 200,000 > > features at a time. Their geometries, even if they are not as > > complicated as the polygon I used last week, are not as simple as > > those I used above. > > > > 7,000,000 calls to union with the two simple polygons crashed the > > script because of a lack of memory. That corresponds (in the better > > case), with 200,000 features, to an average of 35 calls per initial > > feature. Obviously, it's not enough to build a full topology. > > > > I'm trying to split my area into several parts, to build topology > > for each part and then merge the parts. It will be more complicated. > > Will it be enough ? Will I have to come back to C/libgeos ? aaargh ! > > > > Pascal > > > > > > > > _______________________________________________ > > Community mailing list > > [email protected] > > http://lists.gispython.org/mailman/listinfo/community > > _______________________________________________ > Community mailing list > [email protected] > http://lists.gispython.org/mailman/listinfo/community > > > _______________________________________________ > Community mailing list > [email protected] > http://lists.gispython.org/mailman/listinfo/community _______________________________________________ Community mailing list [email protected] http://lists.gispython.org/mailman/listinfo/community
