[Cython] Type inference and memory management

Craig Citro Sun, 11 Jul 2010 23:51:46 -0700

Hi all,

I ran into an interesting question while getting Sage to build and
pass tests against the current cython-devel tip (side note: it does on
both my laptop and sage.math!), and I thought I'd get some opinions on
the "right" way to fix it.


Consider the following bit of code:

    def __repr__(self):
        cdef char* ss = fmpz_poly_to_string(self.poly)
        s = ss
        free(ss)
        return s

You don't need to know what an fmpz_poly is -- the relevant fact is
that it's a call into some C library, and it returns a string
representation in the form of a char *. This code then does what you
expect: gets the value from the C library and returns it. The key is
that it's using the fact that variables implicitly default to Python
objects here -- the "s = ss" line will cause the underlying char * to
be copied into a variable whose memory is managed by Python. I think
this is considered the "Cythonic" way of doing things -- at least,
there are entries in the FAQ suggesting that this is a good idea.

Now enter type inference. It looks at this block and says, "hey, s is
only ever assigned to a char * -- let's call it a char *, too." Of
course, this is a disaster -- it changes the semantics of the
all-important "s = ss" line. As a result, the return value is (a
Python copy of) some random junk. This is easy enough to fix -- we can
be more explicit about our intentions, and declare s to be an object,
which works great. However, this is likely to break at least some user
code in the wild -- especially since we've been recommending this as
the "right" way to do things.

I can see at least a few options:

1) Break the code above, tell people to explicitly declare things to be objects.

2) Decide that if a variable gets returned by a function which is
either a def'd function or returns a Python object, and we don't have
an explicit type declaration already, then we only infer something
which is a subtype of Python object. (Right now, we almost never infer
anything more specific than Python object anyway.)

3) Something else in between, i.e. make some decisions based on the
type that gets inferred. Or, better yet, do some control-flow
analysis/alias analysis and start shadowing the variable with a Python
object only once you need to. (Clearly this is a pipe dream right
now.)

I'm personally leaning towards (2) with the hopes of one day getting
closer to (3). I've got a patch written that does (2), which I'm happy
to push if people agree on that one. I think the tradeoff between (1)
and (2) is fairly standard -- (1) is going to generate (potentially)
faster code, but (2) is going to be much friendlier for someone
migrating Python code.

Thoughts?
-cc
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

[Cython] Type inference and memory management

Reply via email to