Hi Sean, I think I've found what's going wrong with the references.
(the line numbers below correspond to the official release 1.0.11) I added in shapely/geometry/base.py, after line 266 (in __del__ method of the BaseGeometry) print "__del__ %x" % id(self) to see when a geometry instance is actually freed >>> from sys import getrefcount as grc >>> from shapely.geometry import Point >>> p=Point((0,0)) >>> grc(p) 2 Python documentation says that's ok to find +1 for the result that is evaluated inside the function/method. So reference count is 1 >>> "%x" % id(p) 'b7e72e8c' >>> del p __del__ b7e72e8c As soon as the reference count equals 0, __del__ is called >>> p=Point((0,0)) >>> "%x" % id(p) 'b7e72e8c' >>> p.intersection <shapely.topology.BinaryTopologicalOp object at 0xb7d34cac> >>> grc(p) 3 >>> id(p)-id(p.intersection.context) 0L In the "__get__" method of class BinaryTopologicalOp (line 25, topology.c), self.context that was None, now equals my point p There's one more reference and the last statement above shows that the new reference is p.intersection.context >>> res = p.intersection(Point((9,9))) __del__ 94e428c >>> grc(p) 4 >>> id(p)-id(p.intersection.context) 0L A new "call" to "__get__" decrements AND re-increments : reference count doesn't change >>> id(p)-id(res.__p__) 0L >>> del p.intersection.context >>> grc(p) 3 When the function geom_factory is called, a new reference to the point is done "ob.__p__ = parent" (base.py, line 45) with "parent" that is my initial "p" >>> del res.__p__ >>> grc(p) 2 >>> del p __del__ b7e72e8c >>> When I explictely call "del res.__p__", the "p" reference count is OK (2-1) __del__ is now called after the last "del p" I think that this problem occurs for all the geometries (they are all (?) derived from BaseGeometry). And that explains the issue I posted last week and those of yesterday. Do you think it's correct ? Pascal 2009/4/4 Pascal Leroux <[email protected]> > Hi Sean, > > 2009/4/4 Sean Gillies <[email protected]> > >> Hi Pascal, >> >> This cries out for a cascaded union, I think. Maybe we should have a >> 1.0.12 release. Still, the leaks trouble me. Statements like >> >> P = P.union(p) >> >> have caused me problems in the past. Probably a matter of held >> references? Any interest in digging into it? >> > > Of course I'm interested in finding what's going wrong. But I've using > Python (and Shapely) for less than a year (my first mail to the mailing list > was posted in last July) and I think my knowledge in Python is not big > enough, especially for Python/C interface, to fix the issue. > > But I'm trying to get into your code and learn how Shapely works. I would > be so sad to have to work again with C/libgeos. Once again, Shapely saves me > time (thanks !) and I've taught my colleagues (Institut Géographique > National, France) to use Python and Shapely . > > If I find something, I tell you. > > Pascal > > > > >> >> Sean >> >> On Apr 3, 2009, at 3:02 AM, Pascal Leroux wrote: >> >> > Even with very simple geometries : >> > >> > >>> P = Polygon(((0,0),(0,9),(9,9),(9,0))) >> > >>> p = Polygon(((1,1),(1,5),(5,5),(5,1))) >> > >>> count = 1 >> > >>> while 1: >> > ... if count % 10000 == 0: print count >> > ... P = P.union(p) >> > ... count += 1 >> > >> > >> > after 25 minutes : >> > >> > >> > 7020000 >> > 7030000 >> > Python(4969) malloc: *** mmap(size=2097152) failed (error code=12) >> > *** error: can't allocate region >> > *** set a breakpoint in malloc_error_break to debug >> > ... >> > Python(4969) malloc: *** mmap(size=2097152) failed (error code=12) >> > *** error: can't allocate region >> > *** set a breakpoint in malloc_error_break to debug >> > MemoryError >> > Python(4969) malloc: *** mmap(size=2097152) failed (error code=12) >> > *** error: can't allocate region >> > *** set a breakpoint in malloc_error_break to debug >> > Traceback (most recent call last): >> > File "<stdin>", line 3, in <module> >> > File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/ >> > python2.5/site-packages/Shapely-1.0.11-py2.5.egg/shapely/ >> > topology.py", line 34, in __call__ >> > if not self.context.is_valid: >> > MemoryError >> > >>> >> > >> > >> > This code doesn't make sense but it highlights the issue I talked >> > about last week. >> > I used a very big polygon in my last email but, here, I cannot use >> > simpler ones. >> > >> > My final purpose is to build a full topology with several shapefiles >> > located on a 1°x1° area and I often work with more than 200,000 >> > features at a time. Their geometries, even if they are not as >> > complicated as the polygon I used last week, are not as simple as >> > those I used above. >> > >> > 7,000,000 calls to union with the two simple polygons crashed the >> > script because of a lack of memory. That corresponds (in the better >> > case), with 200,000 features, to an average of 35 calls per initial >> > feature. Obviously, it's not enough to build a full topology. >> > >> > I'm trying to split my area into several parts, to build topology >> > for each part and then merge the parts. It will be more complicated. >> > Will it be enough ? Will I have to come back to C/libgeos ? aaargh ! >> > >> > Pascal >> > >> > >> > >> > _______________________________________________ >> > Community mailing list >> > [email protected] >> > http://lists.gispython.org/mailman/listinfo/community >> >> _______________________________________________ >> Community mailing list >> [email protected] >> http://lists.gispython.org/mailman/listinfo/community >> > >
_______________________________________________ Community mailing list [email protected] http://lists.gispython.org/mailman/listinfo/community
