On Mon, Apr 23, 2012 at 12:15 AM, Nathaniel Smith <n...@pobox.com> wrote:
> We need to decide what to do with the NA masking code currently in > master, vis-a-vis the 1.7 release. While this code is great at what it > is, we don't actually have consensus yet that it's the best way to > give our users what they want/need -- or even an appropriate way. So > we need to figure out how to release 1.7 without committing ourselves > to supporting this design in the future. > > Background: what does the code currently in master do? > -------------------------------------------- > > It adds 3 pointers at the end of the PyArrayObject struct (which is > better known as the numpy.ndarray object). These new struct members, > and some accessors for them, are exposed as part of the public API. > There are also a few additions to the Python-level API (mask= argument > to np.array, skipna= argument to ufuncs, etc.) > > What does this mean for compatibility? > ------------------------------------------------ > > The change in the ndarray struct is not as problematic as it might > seem, compatibility-wise, since Python objects are almost always > referred to by pointers. Since the initial part of the struct will > continue to have the same memory layout, existing source and binary > code that works with PyArrayObject *pointers* will continue to work > unchanged. > > One place where the actual struct size matters is for any C-level > ndarray subclasses, which will have their memory layout change, and > thus will need to be recompiled. (Python-level ndarray subclasses will > have their memory layout change as well -- e.g., they will have > different __dictoffset__ values -- but it's unlikely that any existing > Python code depends on such details.) > > What if we want to change our minds later? > ------------------------------------------------------- > > For the same reasons as given above, any new code which avoids > referencing the new struct fields referring to masks, or using the new > masking APIs, will continue to work even if the masking is later > removed. > > Any new code which *does* refer to the new masking APIs, or references > the fields directly, will break if masking is later removed. > Specifically, source will fail to compile, and existing binaries will > silently access memory that is past the end of the PyArrayObject > struct, which will have unpredictable consequences. (Most likely > segfaults, but no guarantees.) This applies even to code which simply > tries to check whether a mask is present. > > So I think the preconditions for leaving this code as-is for 1.7 are > that we must agree: > * We are willing to require a recompile of any C-level ndarray > subclasses (do any exist?) > As long as it's only subclasses I think this may be OK. Not 100% sure on this one though. > * We are willing to make absolutely no guarantees about future > compatibility for code which uses APIs marked "experimental" > That is what I understand "experimental" to mean. Could stay, could change - no guarantees. > * We are willing for this breakage to occur in the form of random > segfaults > This is not OK of course. But it shouldn't apply to the Python API, which I think is the most important one here. > * We are okay with the extra 3 pointers worth of memory overhead on > each ndarray > > Personally I can live with all of these if everyone else can, but I'm > nervous about reducing our compatibility guarantees like that, and > we'd probably need, at a minimum, a flashier EXPERIMENTAL sign than we > currently have. (Maybe we should resurrect the weasels ;-) [1]) > > [1] > http://mail.scipy.org/pipermail/numpy-discussion/2012-March/061204.html > <snip> > I'm personally willing to implement either of these changes. > Thank you Nathaniel, that is a very important and helpful statement. Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion