Stefan Behnel, 26.02.2012 09:50:
> when I took a look at object.h and saw that the Py_DECREF() macro *always*
> calls into it. Another surprise.
>
> I had understood in previous discussions that the refcount emulation in
> cpyext only counts C references, which I consider a suitable design. (I
> guess something as common as Py_None uses the obvious optimisation of
> always having a ref-count > 1, right? At least when not debugging...)
>
> So I changed the macros to use an appropriate C-level implementation:
>
> """
> #define Py_INCREF(ob) ((((PyObject *)ob)->ob_refcnt > 0) ? \
> ((PyObject *)ob)->ob_refcnt++ : (Py_IncRef((PyObject *)ob)))
>
> #define Py_DECREF(ob) ((((PyObject *)ob)->ob_refcnt > 1) ? \
> ((PyObject *)ob)->ob_refcnt-- : (Py_DecRef((PyObject *)ob)))
>
> #define Py_XINCREF(op) do { if ((op) == NULL) ; else Py_INCREF(op); \
> } while (0)
>
> #define Py_XDECREF(op) do { if ((op) == NULL) ; else Py_DECREF(op); \
> } while (0)
> """
>
> to tell the C compiler that it doesn't actually need to call into PyPy in
> most cases (note that I didn't use any branch prediction macros, but that
> shouldn't change all that much anyway). This shaved off a couple of cycles
> from my iteration benchmark, but much less than I would have liked. My
> intuition tells me that this is because almost all objects that appear in
> the benchmark are actually short-lived in C space so that pretty much every
> Py_DECREF() on them kills them straight away and thus calls into
> Py_DecRef() anyway. To be verified with a better test.
Ok, here's a stupid micro-benchmark for ref-counting:
def bench(x):
cdef int i
for i in xrange(10000):
a = x
b = x
c = x
d = x
e = x
f = x
g = x
Leads to the obvious C code. :) (and yes, this will eventually stop
actually being a benchmark in Cython...)
When always calling into Py_IncRef() and Py_DecRef(), I get this
$ pypy -m timeit -s 'from refcountbench import bench' 'bench(10)'
1000 loops, best of 3: 683 usec per loop
With the macros above, I get this:
$ pypy -m timeit -s 'from refcountbench import bench' 'bench(10)'
1000 loops, best of 3: 385 usec per loop
So that's better by almost a factor of 2, just because the C compiler can
handle most of the ref-counting internally once there is more than one C
reference to an object. It will obviously be a lot less than that for
real-world code, but I think it makes it clear enough that it's worth
putting some effort into ways to avoid calling back and forth across the
border for no good reason.
Stefan
_______________________________________________
pypy-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/pypy-dev