Wow, Nathaniel. This looks like a nice piece of tedious work. I have not reviewed it in detail, but in general I would be very supportive of your plan to commit this to master, make a 1.7 release (without the ReduceWrapper) function and then work on the masked array / ndarray separation plan for 1.8
Of course, first I would want to hear from Mark, to hear his comments about what was removed. -Travis On May 19, 2012, at 3:54 PM, Nathaniel Smith wrote: > Hi all, > > Since Mark's original missingdata branch made so many changes, I > figured it would be a useful exercise to figure out what code in > master is actually related to masked arrays, and which isn't. The > easiest way seemed to be to delete the new fields, then keep removing > any code that depended on them until everything built again, which > gives this: > https://github.com/njsmith/numpy/commits/separate-maskna > > Possible uses: > - Use the diff as a checklist for going through to change the API > - Use the diff as a checklist for adding an experimental-API-only flag > - Merge into master, then use as a reference to cherrypick the > pieces that we want to save (there is some questionable stuff in here > -- e.g. copy/paste code, hand-wavy half-implementation of "multi-NA" > support, and PyArray_ReduceWrapper, see below) > - Merge into master and then immediately 'git revert' the changes on > a branch, which would effectively 'move it aside' so we can release > 1.7 while Mark and Travis to continue hacking on it at their own pace > > This is a huge patch, but I was pretty careful not to cause any > accidental non-maskna-related regressions. The whole PyArray_Diagonal > thread actually happened because I noticed that test_maskna had the > only tests for np.diagonal, so I wanted to write proper ones that > would be independent of the maskna code. Also I ran the following > tests: > - numpy.test("full") > - scipy v0.10.1, test("full") > - matplotlib current master > - pandas v0.7.3 > and everything looks good. > > The other complicated thing to handle was the new > PyArray_ReduceWrapper function that was added to the public multiarray > API. Conceptually, this function has only a glancing relationship to > masked arrays per se. But, it has its own problems, and I don't think > it should be exposed in 1.7. Partly because its signature necessarily > changes depending on whether maskna support exists. Partly because > it's just kind of ugly (it has 15 arguments, after I removed some[1]). > But mostly because it gives us two independent generic implementations > of functions that do array->scalar operations, which seems like > something we absolutely don't want to commit to supporting > indefinitely. And the "generalized ufunc" alternative seems a lot more > promising. So, that branch also has a followup patch that does the > necessary hacking to get it out of the public API. > > Unfortunately, this patch is dependent on the previous one -- I'm not > sure how to untangle PyArray_ReduceWrapper while keeping the maskna > support in, which makes the "global experimental flag" idea for 1.7 > hard to implement (assuming that others agree about > PyArray_ReduceWrapper being unready for public exposure). > > At this point, it might easiest to just merge this branch to master, > immediately revert it on a branch for Mark and Travis to work on, and > then release 1.7. > > Ralf, IIUC merging this and my other outstanding PRs would leave the > datetime issues on python3/win32 as the only outstanding blocker? > > - N > > [1] http://www-pu.informatik.uni-tuebingen.de/users/klaeren/epigrams.html > , number 11 > > ------------ Commit messages follow for reference/discussion > > The main change is commit 4c16813c23b20: > Remove maskna API from ndarray, and all (and only) the code supporting it > > The original masked-NA-NEP branch contained a large number of changes > in addition to the core NA support. For example: > - ufunc.__call__ support for where= argument > - nditer support for arbitrary masks (in support of where=) > - ufunc.reduce support for simultaneous reduction over multiple axes > - a new "array assignment API" > - ndarray.diagonal() returning a view in all cases > - bug-fixes in __array_priority__ handling > - datetime test changes > etc. There's no consensus yet on what should be done with the > maskna-related part of this branch, but the rest is generally useful > and uncontroversial, so the goal of this branch is to identify exactly > which code changes are involved in maskna support. > > The basic strategy used to create this patch was: > - Remove the new masking-related fields from ndarray, so no arrays > are masked > - Go through and remove all the code that this makes > dead/inaccessible/irrelevant, in a largely mechanical fashion. So > for example, if I saw 'if (PyArray_HASMASK(a)) { ... }' then that > whole block was obviously just dead code if no arrays have masks, > and I removed it. Likewise for function arguments like skipna that > are useless if there aren't any NAs to skip. > > This changed the signature of a number of functions that were newly > exposed in the numpy public API. I've removed all such functions from > the public API, since releasing them with the NA-less signature in 1.7 > would create pointless compatibility hassles later if and when we add > back the NA-related functionality. Most such functions are removed by > this commit; the exception is PyArray_ReduceWrapper, which requires > more extensive surgery, and will be handled in followup commits. > > I also removed the new ndarray.setasflat method. Reason: a comment > noted that the only reason this was added was to allow easier testing > of one branch of PyArray_CopyAsFlat. That branch is now the main > branch, so that isn't an issue. Nonetheless this function is arguably > useful, so perhaps it should have remained, but I judged that since > numpy's API is already hairier than we would like, it's not a good > idea to add extra hair "just in case". (Also AFAICT the test for this > method in test_maskna was actually incorrect, as noted here: > https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py > so I'm not confident that it ever worked in master, though I haven't > had a chance to follow-up on this.) > > I also removed numpy.count_reduce_items, since without skipna it > became trivial. > > I believe that these are the only exceptions to the "remove dead code" > strategy. > > The ReduceWrapper untangling is a001fb29c9: > Remove PyArray_ReduceWrapper from public API > > There are two reasons to want to keep PyArray_ReduceWrapper out of the > public multiarray API: > - Its signature is likely to change if/when masked arrays are added > - It is essentially a wrapper for array->scalar transformations > (*not* just reductions as its name implies -- the whole reason it > is in multiarray.so in the first place is to support count_nonzero, > which is not actually a reduction!). It provides some nice > conveniences (like making it easy to apply such functions to > multiple axes simultaneously), but, we already have a general > mechanism for writing array->scalar transformations -- generalized > ufuncs. We do not want to have two independent, redundant > implementations of this functionality, one in multiarray and one in > umath! So in the long run we should add these nice features to the > generalized ufunc machinery. And in the short run, we shouldn't add > it to the public API and commit ourselves to supporting it. > > However, simply removing it from numpy_api.py is not easy, because > this code was used in both multiarray and umath. This commit: > - Moves ReduceWrapper and supporting code to umath/, and makes > appropriate changes (e.g. renaming it to PyUFunc_ReduceWrapper and > cleaning up the header files). > - Reverts numpy.count_nonzero to its previous implementation, so that > it loses the new axis= and keepdims= arguments. This is > unfortunate, but this change isn't so urgent that it's worth tying > our APIs in knots forever. (Perhaps in the future it can become a > generalized ufunc.) > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion