On Tue, Mar 6, 2012 at 9:14 PM, Ralf Gommers <ralf.gomm...@googlemail.com> wrote: > On Tue, Mar 6, 2012 at 9:25 PM, Nathaniel Smith <n...@pobox.com> wrote: >> On Sat, Mar 3, 2012 at 8:30 PM, Travis Oliphant <tra...@continuum.io> >> wrote: >> > Hi all, >> >> Hi Travis, >> >> Thanks for bringing this back up. >> >> Have you looked at the summary from the last thread? >> https://github.com/njsmith/numpy/wiki/NA-discussion-status > > Re-reading that summary and the main documents and threads linked from it, I > could find either examples of statistical software that treats missing and > ignored data explicitly separately, or links to relevant literature. Those > would probably help the discussion a lot.
(I think you mean "couldn't find"?) I'm not aware of any software that supports the IGNORED concept at all, whether in combination with missing data or not. np.ma is probably the closest example. I think we'd be breaking new ground there. This is also probably why it is less clear how it should work :-). IIUC, the basic reason that people want IGNORED in the core is that it provides convenience and syntactic sugar for efficient "in place" operation on subsets of large arrays. So there are actually two parts there -- the efficient operation, and the convenience/syntactic sugar. The key feature for efficient operation is the where= feature, which is not controversial at all. So, there's an argument that for now we should focus on where=, give people some time to work with it, and then use that experience to decide what kind of convenience/sugar would be useful, if any. But, that's just my own idea; I definitely can't claim any consensus on it. >> In project management terms, I see three options: >> 1) Put a big warning label on the functionality and leave it for now >> ("If this option is given, np.asarray returns a masked array. NOTE: IN >> THE NEXT RELEASE, IT MAY INSTEAD RETURN A BAG OF RABID, HUNGRY >> WEASELS. NO GUARANTEES.") > > I've opened http://projects.scipy.org/numpy/ticket/2072 for that. Cool, thanks. > Assuming > we stick with this option, I'd appreciate it if you could check in the first > beta that comes out whether or not the warnings are obvious enough and in > all the right places. There probably won't be weasels though:) Of course. I've added myself to the CC list. (Err, if the beta won't be for a bit, though, then please remind me if you remember? I'm juggling a lot of balls right now.) >> 2) Move the code back out of mainline and into a branch until until >> there's consensus. >> 3) Hold up the release until this is all sorted. >> >> I come from the project-management school that says you should always >> have a releasable mainline, keep unready code in branches, and never >> hold up the release for features, so (2) seems obvious to me. > > While it may sound obvious, I hope you've understood why in practice it's > not at all obvious and why you got such strong reactions to your proposal of > taking out all that code. If not, just look at what happened with the > numpy-refactor work. Of course, and that's why I'm not pressing the point. These trade-offs might be worth talking about at some point -- there are reasons that basically all the major FOSS projects have moved towards time-based releases :-) -- but that'd be a huge discussion at a time when we already have more than enough of those on our plate... >> But I seem to be very much in the minority on that[1], so oh well :-). I >> don't have any objection to (1), personally. (3) seems like a bad >> idea. Just my 2 pence. > > > Agreed that (3) is a bad idea. +1 for (1). > > Ralf > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Cheers, -- Nathaniel _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion