Ok, I think it's time to step back and reformulate the problem by completely ignoring the implementation.
Here we have 2 "generic" concepts (i.e., applicable to R), plus another extra concept that is exclusive to numpy: * Assigning np.NA to an array, cannot be undone unless through explicit assignment (i.e., assigning a new arbitrary value, or saving a copy of the original array before assigning np.NA). * np.NA values propagate by default, unless ufuncs have the "skipna = True" argument (or the other way around, it doesn't really matter to this discussion). In order to avoid passing the argument on each ufunc, we either have some per-array variable for the default "skipna" value (undesirable) or we can make a trivial ndarray subclass that will set the "skipna" argument on all ufuncs through the "_ufunc_wrapper_" mechanism. Now, numpy has the concept of views, which adds some more goodies to the list of concepts: * With views, two arrays can share the same physical data, so that assignments to any of them will be seen by others (including NA values). The creation of a view is explicitly stated by the user, so its behaviour should not be perceived as odd (after all, you asked for a view). The good thing is that with views you can avoid costly array copies if you're careful when writing into these views. Now, you can add a new concept: local/temporal/transient missing data. We can take an existing array and create a view with the new argument "transientna = True". Here, both the view and the "transientna = True" are explicitly stated by the user, so it is assumed that she already knows what this is all about. The difference with a regular view is that you also explicitly asked for local/temporal/transient NA values. * Assigning np.NA to an array view with "transientna = True" will *not* be seen by any of the other views (nor the "original" array), but anything else will still work "as usual". After all, this is what *you* asked for when using the "transientna = True" argument. To conclude, say that others *must not* care about whether the arrays they're working with have transient NA values. This way, I can create a view with transient NAs, set to NA some uninteresting data, and pass it to a routine written by someone else that sets to NA elements that, for example, are beyond certain threshold from the mean of the elements. This would be equivalent to storing a copy of the original array before passing it to this 3rd party function, only that "transientna", just as views, provide some handy shortcuts to avoid copies. My main point here is that views and local/temporal/transient NAs are all *explicitly* requested, so that its behaviour should not appear as something unexpected. Is there an agreement on this? Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
