On Sun, Jun 14, 2009 at 3:51 AM, Charles R Harris<charlesr.har...@gmail.com> wrote: > > > On Sat, Jun 13, 2009 at 12:35 PM, David Cournapeau <courn...@gmail.com> > wrote: >> >> On Sun, Jun 14, 2009 at 3:22 AM, Charles R >> Harris<charlesr.har...@gmail.com> wrote: >> >> > 1) Since reference counting is such a pain, you should document that the >> > constructor returns a new reference and that the PyArrayIterObject does >> > not >> > need to have its reference count incremented before the call and that >> > the >> > reference count is unchanged on failure. >> >> OK. >> >> > 2) Why are _update_coord_iter(c) and _inc_set_ptr(c) macros? Why are >> > they >> > defined inside functions? If left as macros, they should be in CAPS, but >> > why >> > not just write them out? >> >> They are macro because they are reused in the 2d specialized functions >> (I will add 3d too) > > IIRC, inline doesn't recurse, so there is some advantage to having these as > macros. But I really dislike seeing macros defined inside of functions, > especially when they aren't exclusive to that function.
Well, they are kind of exclusive to this function - and the special 2d case; they are not supposed to be used by themselves (which is why they are undefined right away). But I changed this anyway to ALL CAPS and defined outside. > IOW, use macros judiciously. It may not be obvious, because it looks really simple, but the code is heavily optimized. I spent several hours to find an implementation which works as it does now. The macro is used for a reason :) > > That's what I wanted to hear. But in c++ is generally best for simplicity > and debugging to start out not using inlines, then add them is benchmarks > show a decent advantage. And in those cases it is best if the inlines are > just a few lines long. That's how I did it :) - and the inlines are short I think. To be more concrete about the numbers, the new correlate which uses the iterator gives me (first shape for x, second for y, compute correlate(x, y), time of one iteration on average for timeit): # Using inline + unrolling (unrolling not implemented for 3d arrays) 100x100 with 20x20 0.0521752119064 300x300 with 10x10 0.0856973171234 9x9x9 with 8x8x8 0.0261607170105 # No explicit inline or unrolling 100x100 with 20x20 0.0859595060349 300x300 with 10x10 0.154387807846 9x9x9 with 8x8x8 0.0328009128571 This is on RHEL5, with a relatively old compiler (gcc 4.1). That's the kind of code where optimization flags/compiler/os matters a lot (differences are over 100 % on a same machine with different combinations of OS/Compilers). As a comparison, the inline+unrolling gives same speed as the old code for correlate, which assumed contiguity and manipulated indexes directly. cheers, David _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion