On Tue, Apr 10, 2012 at 2:15 PM, Dag Sverre Seljebotn <d.s.seljeb...@astro.uio.no> wrote: > On 04/10/2012 03:10 PM, Dag Sverre Seljebotn wrote: >> On 04/10/2012 03:00 PM, Nathaniel Smith wrote: >>> On Tue, Apr 10, 2012 at 1:39 PM, Dag Sverre Seljebotn >>> <d.s.seljeb...@astro.uio.no> wrote: >>>> On 04/10/2012 12:37 PM, Nathaniel Smith wrote: >>>>> On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant<tra...@continuum.io> >>>>> wrote: >>>>>> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote: >>>>>> >>>>>> ...isn't this an operation that will be performed once per compiled >>>>>> function? Is the overhead of the easy, robust method (calling >>>>>> ctypes.cast) >>>>>> actually measurable as compared to, you know, running an optimizing >>>>>> compiler? >>>>>> >>>>>> Yes, there can be significant overhead. The compiler is run once and >>>>>> creates the function. This function is then potentially used many, many >>>>>> times. Also, it is entirely conceivable that the "build" step happens >>>>>> at >>>>>> a separate "compilation" time, and Numba actually loads a pre-compiled >>>>>> version of the function from disk which it then uses at run-time. >>>>>> >>>>>> I have been playing with a version of this using scipy.integrate and >>>>>> unfortunately the overhead of ctypes.cast is rather significant --- to >>>>>> the >>>>>> point of making the code-path using these function pointers to be useless >>>>>> when without the ctypes.cast overhed the speed up is 3-5x. >>>>> >>>>> Ah, I was assuming that you'd do the cast once outside of the inner >>>>> loop (at the same time you did type compatibility checking and so >>>>> forth). >>>>> >>>>>> In general, I think NumPy will need its own simple function-pointer >>>>>> object >>>>>> to use when handing over raw-function pointers between Python and C. >>>>>> SciPy >>>>>> can then re-use this object which also has a useful C-API for things like >>>>>> signature checking. I have seen that ctypes is nice but very slow and >>>>>> without a compelling C-API. >>>>> >>>>> Sounds reasonable to me. Probably nicer than violating ctypes's >>>>> abstraction boundary, and with no real downsides. >>>>> >>>>>> The kind of new C-level cfuncptr object I imagine has attributes: >>>>>> >>>>>> void *func_ptr; >>>>>> char *signature string /* something like 'dd->d' to indicate a function >>>>>> that takes two doubles and returns a double */ >>>>> >>>>> This looks like it's setting us up for trouble later. We already have >>>>> a robust mechanism for describing types -- dtypes. We should use that >>>>> instead of inventing Yet Another baby type system. We'll need to >>>>> convert between this representation and dtypes anyway if you want to >>>>> use these pointers for ufunc loops... and if we just use dtypes from >>>>> the start, we'll avoid having to break the API the first time someone >>>>> wants to pass a struct or array or something. >>>> >>>> For some of the things we'd like to do with Cython down the line, >>>> something very fast like what Travis describes is exactly what we need; >>>> specifically, if you have Cython code like >>>> >>>> cdef double f(func): >>>> return func(3.4) >>>> >>>> that may NOT be called in a loop. >>>> >>>> But I do agree that this sounds overkill for NumPy+numba at the moment; >>>> certainly for scipy.integrate where you can amortize over N function >>>> samples. But Travis perhaps has a usecase I didn't think of. >>> >>> It sounds sort of like you're disagreeing with me but I can't tell >>> about what, so maybe I was unclear :-). >>> >>> All I was saying was that a list-of-dtype-objects was probably a >>> better way to write down a function signature than some ad-hoc string >>> language. In both cases you'd do some type-compatibility-checking up >>> front and then use C calling afterwards, and I don't see why >>> type-checking would be faster or slower for one representation than >>> the other. (Certainly one wouldn't have to support all possible dtypes > > Rereading this, perhaps this is the statement you seek: Yes, doing a > simple strcmp is much, much faster than jumping all around in memory to > check the equality of two lists of dtypes. If it is a string less than 8 > bytes in length with the comparison string known at compile-time (the > Cython case) then the comparison is only a couple of CPU instructions, > as you can check 64 bits at the time.
Right, that's what I wasn't getting until you mentioned strcmp :-). That said, the core numpy dtypes are singletons. For this purpose, the signature could be stored as C array of PyArray_Descr*, but even if we store it in a Python tuple/list, we'd still end up with a contiguous array of PyArray_Descr*'s. (I'm assuming that we would guarantee that it was always-and-only a real PyTupleObject* here.) So for the function we're talking about, the check would compile down to doing the equivalent of a 3*pointersize-byte strcmp, instead of a 5-byte strcmp. That's admittedly worse, but I think the difference between these two comparisons is unlikely to be measurable, considering that they're followed immediately by a cache miss when we actually jump to the function pointer. -- Nathaniel _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion