Eric, Travis, Thanks for the words of encouragements :) I'm all in favor of having maskedarray ported to C, but I won't be able to do it myself anytime soon. And I would have to learn C beforehands. Francesc's suggestion of using Pyrex sounds nice, I'll try and see what I can do with that
> Moving the implementation to the C-level would be awesome. In particular, > __getitem__ and __setitem__ are incredibly slow with masked arrays compared > to ndarrays, so using those inside python loops is basically a really bad > idea currently. You always have to work with the _data and _mask attributes > directly if you are concerned about performance. Well, yeah, that's expected: __getitem__ tests whether the mask is defined (not nomask) before trying to access the item. If you're using it in a loop, you call the test each time, which is a bad idea. it's indeed far better to call the test beforehand, and process _data and _mask separately A fix would be to force the mask to an array of booleans all the time, but that would slow things down elsewhere,as a lot of functions are artificially accelerated with the nomask trick. A C implementation may render that trick obsolete... Another possibility would be to force the mask as an bool array, and keep an extra flag on top, like hasmask. Hasmask would be False by default, and set to True only if the mask is full of False. That'd require a mask.any() in __array_finalize__, which might still slow things down. > Also, there is a "bug" in Pierre's current implementation I spoke with him > about, but currently have no solution for. numpy.add.accumulate doesn't > work on arrays from the new maskedarray implementation, but does with the > old one. The fact that it works with 'old' masked arrays doesn't count: they're not real ndarrays. They use the __array__ method to communicate with the rest of numpy, that we shouldn't need. > The problem seems to arise when you over-ride __getitem__ in an > ndarray sub-class. See the code below for a demonstration: I'm not sure that's actually the source of the problem. ufuncs use the __array_wrap__ method to communicate with subclasses. ufuncs methods seem to bypass that. In the meantime, the method of the MA.ufuncs work as expected. Could somebody give me some simple explanation about the behaviour of ufuncs methods, on the Python side ? I'm obviously missing something here... _______________________________________________ Numpy-discussion mailing list [email protected] http://projects.scipy.org/mailman/listinfo/numpy-discussion
