On Sat, Jan 5, 2013 at 10:10 PM, David Cournapeau <courn...@gmail.com> wrote: > Thanks for the entertaining explanation.
Procrastination is a hell of a drug. > I don't think 0-dim array being slow is such a big drawback. I would > be really surprised if there was no way to make them faster, and > having unspecified, nearly duplicated type handling code in multiple > places is likely one reason why nobody took time to really make them > faster. I agree! > Regarding ufunc combination caching, couldn't we do the caching on > demand ? I am not sure how you arrived at a 600 bytes per ufunc, but > in many real world use cases, I would suspect only a few combinations > would be used. 600 bytes is for an implementation that just kept a table like chosen_ufunc_offset = np.empty((24, 24), dtype=uint8) # 576 bytes and looked up the proper ufunc loop by doing ufunc_loops[chosen_ufunc_offset[left_arg_typenum, right_arg_typenum]] I suspect we could pre-fill such tables extremely quickly (basically just fill in all the exact matches, and then do a flood-fill along the can_cast graph), or we could fill them on-demand. (Also even 24 * 24 is currently an over-estimate since some of those 24 types are parametrized, and currently ufuncs can't handle parametrized types, but hopefully that will get fixed at some point.) The numpy main namespace and scipy.special together contain only 35 + 47 = 82 ufuncs that take 2 arguments[1], so loading those two modules using this scheme would add a total of *50 kilobytes* to numpy's memory overhead... (Interestingly, scipy.special does include 48 three-argument ufuncs, 15 four-argument ufuncs, and 6 five-argument ufuncs, which obviously cannot use a table lookup scheme. Maybe we can add a check for symmetry -- if all the loops are defined on matching types, like "dd->d" and "ff->f", then really it's a one-dimensional lookup problem -- find a common type for the inputs ("d" or "f") and then find the best loop for that one type.) Another option, like you suggest (?), would be to keep a little fixed-size LRU cache for each ufunc and do lookups in it by linear search. It's hard to know how many different types get used in real-world programs, though, and I'd worry about performance falling off a cliff as soon as someone tweaked their inner loop so it used 6 different (type1, type2) combinations instead of 5 or whatever (maybe in one place they do int * float and in another float * int, etc.). Anyway the point is yes, this particular thing is eminently fixable. > Scalar arrays are ones of the most esoteric feature of numpy, and a > fairly complex one in terms of implementation. Getting rid of it would > be a net plus on that side. Of course, there is the issue of backward > compatibility, whose extend is hard to assess. You mean array scalars, not scalar arrays, right?[2] -n [1] len([v for v in np.__dict__.values() if isinstance(v, np.ufunc) and v.nin == 2]) [2] The fact that this sentence means something is certainly evidence for... something. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion