We need to decide what to do with the NA masking code currently in master, vis-a-vis the 1.7 release. While this code is great at what it is, we don't actually have consensus yet that it's the best way to give our users what they want/need -- or even an appropriate way. So we need to figure out how to release 1.7 without committing ourselves to supporting this design in the future.
Background: what does the code currently in master do? -------------------------------------------- It adds 3 pointers at the end of the PyArrayObject struct (which is better known as the numpy.ndarray object). These new struct members, and some accessors for them, are exposed as part of the public API. There are also a few additions to the Python-level API (mask= argument to np.array, skipna= argument to ufuncs, etc.) What does this mean for compatibility? ------------------------------------------------ The change in the ndarray struct is not as problematic as it might seem, compatibility-wise, since Python objects are almost always referred to by pointers. Since the initial part of the struct will continue to have the same memory layout, existing source and binary code that works with PyArrayObject *pointers* will continue to work unchanged. One place where the actual struct size matters is for any C-level ndarray subclasses, which will have their memory layout change, and thus will need to be recompiled. (Python-level ndarray subclasses will have their memory layout change as well -- e.g., they will have different __dictoffset__ values -- but it's unlikely that any existing Python code depends on such details.) What if we want to change our minds later? ------------------------------------------------------- For the same reasons as given above, any new code which avoids referencing the new struct fields referring to masks, or using the new masking APIs, will continue to work even if the masking is later removed. Any new code which *does* refer to the new masking APIs, or references the fields directly, will break if masking is later removed. Specifically, source will fail to compile, and existing binaries will silently access memory that is past the end of the PyArrayObject struct, which will have unpredictable consequences. (Most likely segfaults, but no guarantees.) This applies even to code which simply tries to check whether a mask is present. So I think the preconditions for leaving this code as-is for 1.7 are that we must agree: * We are willing to require a recompile of any C-level ndarray subclasses (do any exist?) * We are willing to make absolutely no guarantees about future compatibility for code which uses APIs marked "experimental" * We are willing for this breakage to occur in the form of random segfaults * We are okay with the extra 3 pointers worth of memory overhead on each ndarray Personally I can live with all of these if everyone else can, but I'm nervous about reducing our compatibility guarantees like that, and we'd probably need, at a minimum, a flashier EXPERIMENTAL sign than we currently have. (Maybe we should resurrect the weasels ;-) [1]) [1] http://mail.scipy.org/pipermail/numpy-discussion/2012-March/061204.html Any other options? ------------------------ Alternative 1: The obvious other option is to go through and move all the strictly mask-related code out of master and into a branch. Presumably this wouldn't include all the infrastructure that Mark added, since a lot of it is e.g. shared with where=, and that would stay. Even so, this would be a big and possibly time-consuming change. Alternative 2: After auditing the code a bit, the cleanest third option I can think of is: 1. Go through and make sure that all numpy-internal access to the new maskna fields happens via the accessor functions. (This patch would produce no functionality change.) 2. Move the accessors into some numpy-internal header file, so that user code can't call them. 3. Remove the mask= argument to Python-level ndarray constructors, remove the new maskna_ fields from PyArrayObject, and modify the accessors so that they always return NULL, 0, etc., as if the array does not have a mask. This would make 1.7 completely compatible with 1.6 API and ABI-wise. But it would also be a minimal code change, leaving the mask-related code paths in place but inaccessible. If we decided to re-enable them, it would just be matter of reverting steps (3) and (2). The main downside I see with this approach is that leaving a bunch of inaccessible code paths lying around might make it harder to maintain 1.7 as a "long term support" release. I'm personally willing to implement either of these changes. Or perhaps there's another option that I'm not thinking of! -- Nathaniel _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion