On 07/06/2011 02:27 PM, Dag Sverre Seljebotn wrote: > On 07/06/2011 02:05 PM, Matthew Brett wrote: >> Hi, >> >> Just for reference, I am using this as the latest version of the NEP - >> I hope it's current: >> >> https://github.com/m-paradox/numpy/blob/7b10c9ab1616b9100e98dd2ab80cef639d5b5735/doc/neps/missing-data.rst >> >> I'm mostly relaying stuff I said, although generally (please do >> correct me if I am wrong) I am just re-expressing points that >> Nathaniel has already made in the alterNEP text and the emails. >> >> On Wed, Jul 6, 2011 at 12:46 AM, Christopher Jordan-Squire >> <cjord...@uw.edu> wrote: >> ... >>> Since we only have Mark is only around Austin until early August, there's >>> also broad agreement that we need to get something done quickly. >> >> I think I might have missed that part of the discussion :) >> >> I feel the need to emphasize the centrality of the assertion by >> Nathaniel, and agreement by (at least) me, that the NA case (there >> really is no data) and the IGNORE case (there is data but I'm >> concealing it from you) are conceptually different, and come from >> different use-cases. >> >> The underlying disagreement returned many times to this fundamental >> difference between the NEP and alterNEP: >> >> In the NEP - by design - it is impossible to distinguish between na.NA >> and na.IGNORE >> The alterNEP insists you should be able to distinguish. >> >> Mark says something like "it's all missing data, there's no reason you >> should want to distinguish". Nathaniel and I were saying "the two >> types of missing do have different use-cases, and it should be >> possible to distinguish. You might want to chose to treat them the >> same, but you should be able to see what they are.". >> >> I returned several times to this (original point by Nathaniel): >> >> a[3] = np.NA >> >> (what does this mean? I am altering the underlying array, or a mask? >> How would I explain this to someone?) >> >> We confirmed that, in order to make it difficult to know what your NA >> is (masked or bit-pattern), Mark has to a) hinder access to the data >> below the mask and b) prevent direct API access to the masking array. >> I described this as 'hobbling the API' and Mark thought of it as >> 'generic programming' (missing is always missing). > > Here's an HPC perspective...: > > If you, say, want to off-load array processing with a mask to some code > running on a GPU, you really can't have the GPU go through some NumPy > API. Or if you want to implement a masked array on a cluster with MPI, > you similarly really, really want raw access. > > At least I feel that the transparency of NumPy is a huge part of its > current success. Many more than me spend half their time in C/Fortran > and half their time in Python. > > I tend to look at NumPy this way: Assuming you have some data in memory > (possibly loaded by a C or Fortran library). (Almost) no matter how it > is allocated, ordered, packed, aligned -- there's a way to find strides > and dtypes to put a nice NumPy wrapper around it and use the memory from > Python. > > So, my view on Mark's NEP was: With a reasonably amount of flexibility > in how you decided to implement masking for your data, you can create a > NumPy wrapper that will understand that. Whether your Fortran library > exposes NAs in its 40GB buffer as bit patterns, or using a seperate > mask, both will work. > > And IMO Mark's NEP comes rather close to this, you just need an > additional NEP later to give raw details to the implementation details, > once those are settled :-)
To be concrete, I'm thinking something like a custom extension to PEP 3118, which could also allow efficient access from Cython without hard-coding Cython for NumPy (a GSoC project this summer will continue to move us away from the "np.ndarray[int]" syntax to a more generic "int[:]" that's less tied to NumPy). But first things first! Dag Sverre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion