Hi Sean,

I think I've found what's going wrong with the references.

(the line numbers below correspond to the official release 1.0.11)

I added in shapely/geometry/base.py, after line 266 (in __del__ method of
the BaseGeometry)
print "__del__ %x" % id(self)
to see when a geometry instance is actually freed


>>> from sys import getrefcount as grc
>>> from shapely.geometry import Point
>>> p=Point((0,0))
>>> grc(p)
2

Python documentation says that's ok to find +1 for the result that is
evaluated inside the function/method. So reference count is 1

>>> "%x" % id(p)
'b7e72e8c'
>>> del p
__del__ b7e72e8c

As soon as the reference count equals 0, __del__ is called

>>> p=Point((0,0))
>>> "%x" % id(p)
'b7e72e8c'
>>> p.intersection
<shapely.topology.BinaryTopologicalOp object at 0xb7d34cac>
>>> grc(p)
3
>>> id(p)-id(p.intersection.context)
0L

In the "__get__" method of class BinaryTopologicalOp (line 25, topology.c),
self.context that was None, now equals my point p
There's one more reference and the last statement above shows that the new
reference is p.intersection.context

>>> res = p.intersection(Point((9,9)))
__del__ 94e428c
>>> grc(p)
4
>>> id(p)-id(p.intersection.context)
0L

A new "call" to "__get__" decrements AND re-increments : reference count
doesn't change

>>> id(p)-id(res.__p__)
0L
>>> del p.intersection.context
>>> grc(p)
3

When the function geom_factory is called, a new reference to the point is
done
"ob.__p__ = parent" (base.py, line 45) with "parent" that is my initial "p"

>>> del res.__p__
>>> grc(p)
2
>>> del p
__del__ b7e72e8c
>>>

When I explictely call "del res.__p__", the "p" reference count is OK (2-1)
__del__ is now called after the last "del p"


I think that this problem occurs for all the geometries (they are all (?)
derived from BaseGeometry).
And that explains the issue I posted last week and those of yesterday.

Do you think it's correct ?

Pascal




2009/4/4 Pascal Leroux <[email protected]>

> Hi Sean,
>
> 2009/4/4 Sean Gillies <[email protected]>
>
>> Hi Pascal,
>>
>> This cries out for a cascaded union, I think. Maybe we should have a
>> 1.0.12 release. Still, the leaks trouble me. Statements like
>>
>>   P = P.union(p)
>>
>> have caused me problems in the past. Probably a matter of held
>> references? Any interest in digging into it?
>>
>
> Of course I'm interested in finding what's going wrong. But I've using
> Python (and Shapely) for less than a year (my first mail to the mailing list
> was posted in last July) and I think my knowledge in Python is not big
> enough, especially for Python/C interface, to fix the issue.
>
> But I'm trying to get into your code and learn how Shapely works. I would
> be so sad to have to work again with C/libgeos. Once again, Shapely saves me
> time (thanks !) and I've taught my colleagues (Institut Géographique
> National, France) to use Python and Shapely .
>
> If I find something, I tell you.
>
> Pascal
>
>
>
>
>>
>> Sean
>>
>> On Apr 3, 2009, at 3:02 AM, Pascal Leroux wrote:
>>
>> > Even with very simple geometries :
>> >
>> > >>> P = Polygon(((0,0),(0,9),(9,9),(9,0)))
>> > >>> p = Polygon(((1,1),(1,5),(5,5),(5,1)))
>> > >>> count = 1
>> > >>> while 1:
>> > ...     if count % 10000 == 0: print count
>> > ...     P = P.union(p)
>> > ...     count += 1
>> >
>> >
>> > after 25 minutes :
>> >
>> >
>> > 7020000
>> > 7030000
>> > Python(4969) malloc: *** mmap(size=2097152) failed (error code=12)
>> > *** error: can't allocate region
>> > *** set a breakpoint in malloc_error_break to debug
>> > ...
>> > Python(4969) malloc: *** mmap(size=2097152) failed (error code=12)
>> > *** error: can't allocate region
>> > *** set a breakpoint in malloc_error_break to debug
>> > MemoryError
>> > Python(4969) malloc: *** mmap(size=2097152) failed (error code=12)
>> > *** error: can't allocate region
>> > *** set a breakpoint in malloc_error_break to debug
>> > Traceback (most recent call last):
>> >   File "<stdin>", line 3, in <module>
>> >   File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
>> > python2.5/site-packages/Shapely-1.0.11-py2.5.egg/shapely/
>> > topology.py", line 34, in __call__
>> >     if not self.context.is_valid:
>> > MemoryError
>> > >>>
>> >
>> >
>> > This code doesn't make sense but it highlights the issue I talked
>> > about last week.
>> > I used a very big polygon in my last email but, here, I cannot use
>> > simpler ones.
>> >
>> > My final purpose is to build a full topology with several shapefiles
>> > located on a 1°x1° area and I often work with more than 200,000
>> > features at a time. Their geometries, even if they are not as
>> > complicated as the polygon I used last week, are not as simple as
>> > those I used above.
>> >
>> > 7,000,000 calls to union with the two simple polygons crashed the
>> > script because of a lack of memory. That corresponds (in the better
>> > case), with 200,000 features, to an average of 35 calls per initial
>> > feature. Obviously, it's not enough to build a full topology.
>> >
>> > I'm trying to split my area into several parts, to build topology
>> > for each part and then merge the parts. It will be more complicated.
>> > Will it be enough ? Will I have to come back to C/libgeos ? aaargh !
>> >
>> > Pascal
>> >
>> >
>> >
>> > _______________________________________________
>> > Community mailing list
>> > [email protected]
>> > http://lists.gispython.org/mailman/listinfo/community
>>
>> _______________________________________________
>> Community mailing list
>> [email protected]
>> http://lists.gispython.org/mailman/listinfo/community
>>
>
>
_______________________________________________
Community mailing list
[email protected]
http://lists.gispython.org/mailman/listinfo/community

Reply via email to