On Tue, Jun 13, 2006 at 09:56:37AM -0700, Tim Hochberg wrote: > > I've finally got around to looking at numexpr again. Specifically, I'm > looking at Francesc Altet's numexpr-0.2, with the idea of harmonizing > the two versions. Let me go through his list of enhancements and comment > (my comments are dedented): > > - Addition of a boolean type. This allows better array copying times > for large arrays (lightweight computations ara typically bounded by > memory bandwidth). > > Adding this to numexpr looks like a no brainer. Behaviour of booleans > are different than integers, so in addition to being more memory > efficient, this enables boolean &, |, ~, etc to work properly. > > - Enhanced performance for strided and unaligned data, specially for > lightweigth computations (e.g. 'a>10'). With this and the addition of > the boolean type, we can get up to 2x better times than previous > versions. Also, most of the supported computations goes faster than > with numpy or numarray, even the simplest one. > > Francesc, if you're out there, can you briefly describe what this > support consists of? It's been long enough since I was messing with this > that it's going to take me a while to untangle NumExpr_run, where I > expect it's lurking, so any hints would be appreciated. > > - Addition of ~, & and | operators (a la numarray.where) > > Sounds good.
All the above is checked in already :-) > - Support for both numpy and numarray (use the flag --force-numarray > in setup.py). > > At first glance this looks like it doesn't make things to messy, so I'm > in favor of incorporating this. ... although I had ripped this all out. I'd rather have a numpy-compatible numarray layer (at the C level, this means defining macros like PyArray_DATA) than different code for each. > - Added a new benchmark for testing boolean expressions and > strided/unaligned arrays: boolean_timing.py > > Benchmarks are always good. Haven't checked that in yet. > > Things that I want to address in the future: > > - Add tests on strided and unaligned data (currently only tested > manually) > > Yep! Tests are good. > > - Add types for int16, int64 (in 32-bit platforms), float32, > complex64 (simple prec.) > > I have some specific ideas about how this should be accomplished. > Basically, I don't think we want to support every type in the same way, > since this is going to make the case statement blow up to an enormous > size. This may slow things down and at a minimum it will make things > less comprehensible. I've been thinking how to generate the virtual machine programmatically, specifically I've been looking at vmgen from gforth again. I've got other half-formed ideas too (separate scalar machine for reductions?) that I'm working on too. But yes, the # of types does make things harder to redo :-) > My thinking is that we only add casts for the extra > types and do the computations at high precision. Thus adding two int16 > numbers compiles to two OP_CAST_Ffs followed by an OP_ADD_FFF, and then > a OP_CAST_fF. The details are left as an excercise to the reader ;-). > So, adding int16, float32, complex64 should only require the addition of > 6 casting opcodes plus appropriate modifications to the compiler. My thinking too. > For large arrays, this should have most of the benfits of giving each > type it's own opcode, since the memory bandwidth is still small, while > keeping the interpreter relatively simple. > > Unfortunately, int64 doesn't fit under this scheme; is it used enough to > matter? I hate pile a whole pile of new opcodes on for something that's > rarely used. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |[EMAIL PROTECTED] _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion