Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-22 Thread Paul Anton Letnes

On 21. apr. 2012, at 00:16, Drew Frank wrote:

 On Fri, Apr 20, 2012 at 11:45 AM, Chris Barker chris.bar...@noaa.gov wrote:
 
 On Fri, Apr 20, 2012 at 11:39 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no wrote:
 Oh, right. I was thinking small as in fits in L2 cache, not small as
 in a few dozen entries.
 
 Another example of a small array use-case: I've been using numpy for
 my research in multi-target tracking, which involves something like a
 bunch of entangled hidden markov models. I represent target states
 with small 2d arrays (e.g. 2x2, 4x4, ..) and observations with small
 1d arrays (1 or 2 elements). It may be possible to combine a bunch of
 these small arrays into a single larger array and use indexing to
 extract views, but it is much cleaner and more intuitive to use
 separate, small arrays. It's also convenient to use numpy arrays
 rather than a custom class because I use the linear algebra
 functionality as well as integration with other libraries (e.g.
 matplotlib).
 
 I also work with approximate probabilistic inference in graphical
 models (belief propagation, etc), which is another area where it can
 be nice to work with many small arrays.
 
 In any case, I just wanted to chime in with my small bit of evidence
 for people wanting to use numpy for work with small arrays, even if
 they are currently pretty slow. If there were a special version of a
 numpy array that would be faster for cases like this, I would
 definitely make use of it.
 
 Drew

Although performance hasn't been a killer for me, I've been using numpy arrays 
(or matrices) for Mueller matrices [0] and Stokes vectors [1]. These describe 
the polarization of light and are always 4x1 vectors or 4x4 matrices. It would 
be nice if my code ran in 1 night instead of one week, although this is still 
tolerable in my case. Again, just an example of how small-vector/matrix 
performance can be important in certain use cases.

Paul

[0] https://en.wikipedia.org/wiki/Mueller_calculus
[1] https://en.wikipedia.org/wiki/Stokes_vector
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-20 Thread Frédéric Bastien
Hi,

I just discovered that the NA mask will modify the base ndarray
object. So I read about it to find the consequences on our c code. Up
to now I have fully read:

http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html

and partially read:

https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
https://github.com/njsmith/numpy/wiki/NA-discussion-status

In those documents, I see a problem with legacy code that will receive
an NA masked array as input. If I missed something, tell me.


All our c functions check their inputs array with PyArray_Check and
PyArray_ISALIGNED. If the NA mask array is set inside the ndarray c
object, our c functions who don't know about it and will treat those
inputs as not masked. So the user will have unexpected results. The
output will be an ndarray without mask but the code will have used the
masked value.

This will also happen with all other c code that use ndarray.

In our case, all the input check is done at the same place, so adding
the check with PyArray_HasNASupport(PyArrayObject* obj) to raise an
error will be easy for us. But I don't think this is the case for most
other c code.

So I would prefer a separate object to protect users from code not
being updated to reject NA masked inputs.

An alternative would be to have PyArray_Check return False for the NA
masked array, but I don't like that as this break the semantic that it
check for the class.

A last option I see would be to make the NPY_ARRAY_BEHAVED flag also
check that the array is not an NA marked array. I suppose many c code
do this check. But this is not a bullet proof check as not all code
(as ours) do not use it.


Also, I don't mind the added pointers to the structure as we use big arrays.

thanks

Frédéric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-20 Thread Fernando Perez
On Fri, Apr 20, 2012 at 9:49 AM, Chris Barker chris.bar...@noaa.gov wrote:

 I recall discossion a couple times in the past of having some
 special-case numpy arrays for the simple, small cases -- perhaps 1-d
 or 2-d C-contiguous only, for instance. That might be a better way to
 address the small-array performance issue, and free us of concerns
 about minor growth to the core ndarray object.

+1 on that: I once wrote such code in pyrex (years ago) and it worked
extremely well for me.  No fancy features, very small footprint and
highly optimized codepaths that gave me excellent performance.


Cheers,

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-20 Thread Dag Sverre Seljebotn


Fernando Perez fperez@gmail.com wrote:

On Fri, Apr 20, 2012 at 9:49 AM, Chris Barker chris.bar...@noaa.gov
wrote:

 I recall discossion a couple times in the past of having some
 special-case numpy arrays for the simple, small cases -- perhaps 1-d
 or 2-d C-contiguous only, for instance. That might be a better way to
 address the small-array performance issue, and free us of concerns
 about minor growth to the core ndarray object.

+1 on that: I once wrote such code in pyrex (years ago) and it worked
extremely well for me.  No fancy features, very small footprint and
highly optimized codepaths that gave me excellent performance.

I don't think you gain that much by using a different type though? Those 
optimized code paths could be plugged into NumPy as well.

I always assumed that it would be possible to optimize NumPy, just that nobody 
invested time in it.

Starting from scratch you gain that you don't have to work with and understand 
NumPy's codebase, but I honestly think that's a small price to pay for 
compatibility.

Dag




Cheers,

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-20 Thread Fernando Perez
On Fri, Apr 20, 2012 at 11:27 AM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:

 I don't think you gain that much by using a different type though? Those 
 optimized code paths could be plugged into NumPy as well.

Could be: this was years ago, and the bottleneck for me was in the
constructor and in basic arithmetic.  I had to make millions of these
vectors and I needed to do basic arithmetic, but they were always 1-d
and had one to 6 entries only.  So writing a very static constructor
with very low overhead did make a huge difference in that project.

Also, when I wrote this code numpy didn't exist, I was using Numeric.

Perhaps the same results could be obtained in numpy itself with
judicious coding, I don't know.  But in that project, ~600 lines of
really easy pyrex code (it would be cython today) made a *huge*
performance difference for me.

Cheers,

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-20 Thread Dag Sverre Seljebotn
On 04/20/2012 08:35 PM, Fernando Perez wrote:
 On Fri, Apr 20, 2012 at 11:27 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no  wrote:

 I don't think you gain that much by using a different type though? Those 
 optimized code paths could be plugged into NumPy as well.

 Could be: this was years ago, and the bottleneck for me was in the
 constructor and in basic arithmetic.  I had to make millions of these
 vectors and I needed to do basic arithmetic, but they were always 1-d
 and had one to 6 entries only.  So writing a very static constructor
 with very low overhead did make a huge difference in that project.

Oh, right. I was thinking small as in fits in L2 cache, not small as 
in a few dozen entries. You definitely still need a Cython class then.

Dag


 Also, when I wrote this code numpy didn't exist, I was using Numeric.

 Perhaps the same results could be obtained in numpy itself with
 judicious coding, I don't know.  But in that project, ~600 lines of
 really easy pyrex code (it would be cython today) made a *huge*
 performance difference for me.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-20 Thread Chris Barker
On Fri, Apr 20, 2012 at 11:39 AM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:
 Oh, right. I was thinking small as in fits in L2 cache, not small as
 in a few dozen entries.

or even two or three entries.

I often use a (2,) or (3,) numpy array to represent an (x,y) point
(usually pulled out from a Nx2 array).

I like it 'cause i can do array math, etc. it makes the code cleaner,
but it's actually faster to use tuples and do the indexing by hand :-(

but yes, having something built-in, or at least very compatible with
numpy would be best.

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-20 Thread Drew Frank
On Fri, Apr 20, 2012 at 11:45 AM, Chris Barker chris.bar...@noaa.gov wrote:

 On Fri, Apr 20, 2012 at 11:39 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no wrote:
  Oh, right. I was thinking small as in fits in L2 cache, not small as
  in a few dozen entries.

Another example of a small array use-case: I've been using numpy for
my research in multi-target tracking, which involves something like a
bunch of entangled hidden markov models. I represent target states
with small 2d arrays (e.g. 2x2, 4x4, ..) and observations with small
1d arrays (1 or 2 elements). It may be possible to combine a bunch of
these small arrays into a single larger array and use indexing to
extract views, but it is much cleaner and more intuitive to use
separate, small arrays. It's also convenient to use numpy arrays
rather than a custom class because I use the linear algebra
functionality as well as integration with other libraries (e.g.
matplotlib).

I also work with approximate probabilistic inference in graphical
models (belief propagation, etc), which is another area where it can
be nice to work with many small arrays.

In any case, I just wanted to chime in with my small bit of evidence
for people wanting to use numpy for work with small arrays, even if
they are currently pretty slow. If there were a special version of a
numpy array that would be faster for cases like this, I would
definitely make use of it.

Drew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-17 Thread Gael Varoquaux
On Mon, Apr 16, 2012 at 10:40:53PM -0500, Travis Oliphant wrote:
  The objectors object to any binary ABI change, but not specifically
  three pointers rather than two or one?

 Adding pointers is not really an ABI change (but removing them after
 they were there would be...)  It's really just the addition of data to
 the NumPy array structure that they aren't going to use.  Most of the
 time it would not be a real problem (the number of use-cases where you
 have a lot of small NumPy arrays is small), but when it is a problem it
 is very annoying. 

I think that something that the numpy community must be very careful
about is ABI breakage. At the scale of a large and heavy institution, it
is very costly. In my mind, this is the argument that should guide the
discussion: does going one way of the other (removing NA or not) will
lead us likely into ABI breakage ?

My 2 cents,

Gaël
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-17 Thread Nathaniel Smith
On Tue, Apr 17, 2012 at 6:44 AM, Travis Oliphant tra...@continuum.io wrote:
 Basically, there are two sets of changes as far as I understand right now:

        1) ufunc infrastructure understands masked arrays
        2) ndarray grew attributes to represent masked arrays

 I am proposing that we keep 1) but change 2) so that only certain kinds of 
 NumPy arrays actually have the extra function pointers (effectively a 
 sub-type).   In essence, what I'm proposing is that the NumPy 1.6 
 PyArrayObject become a base-object, but the other members of the C-structure 
 are not even present unless the Masked flag is set.   Such changes would not 
 require ripping code out --- just altering the presentation a bit.   Yet, 
 they could have large long-term implications, that we should explore before 
 they get fixed.

 Whether masked arrays should be a formal sub-class is actually an un-related 
 question and I generally lean in the direction of not encouraging sub-classes 
 of the ndarray.   The big questions are does this object work in the 
 calculation infrastructure.   Can I add an array to a masked array.   Does it 
 have a sum method?   I think it could be argued that a masked array does have 
 a is a relationship with an array.   It can also be argued that it is 
 better to have a has a relationship with an array and be-it's own-object.   
 Either way, this object could still have it's first-part be binary compatible 
 with a NumPy Array, and that is what I'm really suggesting.

It sounds like the main implementation issue here is that this masked
array class needs some way to coordinate with the ufunc infrastructure
to efficiently and reliably handle the mask in calculations. The core
ufunc code now knows how to handle masks, and this functionality is
needed for where= and NA-dtypes, so obviously it's staying,
independent of what we decide to do with masked arrays. So the
question is just, how do we get the masked array and the ufuncs
talking to each other so they can do the right thing. Perhaps we
should focus, then, on how to create a better hooking mechanism for
ufuncs? Something along these lines?
  http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html
If done in a solid enough way, this would also solve other problems,
e.g. we could make ufuncs work reliably on sparse matrices, which
seems to trip people up on scipy-user every month or two. Of course,
it's very tricky to get right :-(

As far the masked array API: I'm still not convinced we know how we
want these things to behave. The masked arrays in master currently
implement MISSING semantics, but AFAICT everyone who wants MISSING
semantics prefers NA-dtypes or even plain old NaN's over a masked
implementation. And some of the current implementation's biggest
backers, like Chuck, have argued that they should switch to
skipNA=True, which is more of an IGNORED-style semantic. OTOH, there's
still disagreement over how IGNORED-style semantics should even work
(I'm thinking of that discussion about commutivity). The best existing
model is numpy.ma -- but the numpy.ma API is quite different from the
NEP, in more ways than just the default setting for skipNA. numpy.ma
uses the opposite convention for mask values, it has additional
concepts like the fillvalue, hardmask versus softmask, and then
there's the whole way the NEP uses views to manage the mask. And I
don't know which of these numpy.ma features are useful, which are
extraneous, and which are currently useful but will become extraneous
once the users who really wanted something more like NA-dtypes switch
to those.

So we all agree that masked arrays can be useful, and that numpy.ma
has problems. But straightforwardly porting numpy.ma to C doesn't seem
like it would help much, and neither does simply declaring that
numpy.ma has been deprecated in favour of a new NEP-like API.

So, I dunno. It seems like it might make the most sense to:
1) take the mask fields out of the core ndarray (while leaving the
rest of Mark's infrastructure, as per above)
2) make sure we have the hooks needed so that numpy.ma, and NEP-like
APIs, and whatever other experiments people want to try, can all
integrate well with ufuncs, and make any other extensions that are
generally useful and required so that they can work well
3) once we've experimented, move the winner into the core. Or whatever
else makes sense to do once we understand what we're trying to
accomplish.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-17 Thread Nathaniel Smith
On Tue, Apr 17, 2012 at 5:59 AM, Matthew Brett matthew.br...@gmail.com wrote:
 Hi,

 On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphant tra...@continuum.io wrote:
 Mark and I will have conversations about NumPy while he is in Austin.   
 There are many other active stake-holders whose opinions and views are 
 essential for major changes.    Mark and I are working on other things 
 besides just NumPy and all NumPy changes will be discussed on list and 
 require consensus or super-majority for NumPy itself to change.     I'm not 
 sure if that helps.   Is there more we can do?

 As you might have heard me say before, my concern is that it has not
 been easy to have good discussions on this list.   I think the problem
 has been that is has not been clear what the culture was, and how
 decisions got made, and that had led to some uncomfortable and
 unhelpful discussions.  My plea would be for you as BDF$N to strongly
 encourage on-list discussions and discourage off-list discussions as
 far as possible, and to help us make the difficult public effort to
 bash out the arguments to clarity and consensus.  I know that's a big
 ask.

Hi Matthew,

As you know, I agree with everything you just said :-). So in interest
of transparency, I should add: I have been in touch with Travis some
off-list, and the main topic has been how to proceed in a way that
let's us achieve public consensus.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-17 Thread Matthew Brett
Hi,

On Tue, Apr 17, 2012 at 7:24 AM, Nathaniel Smith n...@pobox.com wrote:
 On Tue, Apr 17, 2012 at 5:59 AM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 Hi,

 On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphant tra...@continuum.io wrote:
 Mark and I will have conversations about NumPy while he is in Austin.   
 There are many other active stake-holders whose opinions and views are 
 essential for major changes.    Mark and I are working on other things 
 besides just NumPy and all NumPy changes will be discussed on list and 
 require consensus or super-majority for NumPy itself to change.     I'm not 
 sure if that helps.   Is there more we can do?

 As you might have heard me say before, my concern is that it has not
 been easy to have good discussions on this list.   I think the problem
 has been that is has not been clear what the culture was, and how
 decisions got made, and that had led to some uncomfortable and
 unhelpful discussions.  My plea would be for you as BDF$N to strongly
 encourage on-list discussions and discourage off-list discussions as
 far as possible, and to help us make the difficult public effort to
 bash out the arguments to clarity and consensus.  I know that's a big
 ask.

 Hi Matthew,

 As you know, I agree with everything you just said :-). So in interest
 of transparency, I should add: I have been in touch with Travis some
 off-list, and the main topic has been how to proceed in a way that
 let's us achieve public consensus.

I'm glad to hear that discussion is happening, but please do have it
on list.   If it's off list it easy for people to feel they are being
bypassed, and that the public discussion is not important.  So, yes,
you might get a better outcome for this specific case, but a worse
outcome in the long term, because the list will start to feel that
it's for signing off or voting rather than discussion, and that - I
feel sure - would lead to worse decisions.

The other issue is that there's a reason you are having the discussion
off-list - which is that it was getting difficult on-list.  But -
again - a personal view - that really has to be addressed directly by
setting out the rules of engagement and modeling the kind of
discussion we want to have.

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-17 Thread Eric Firing
On 04/17/2012 08:40 AM, Matthew Brett wrote:
 Hi,

 On Tue, Apr 17, 2012 at 7:24 AM, Nathaniel Smithn...@pobox.com  wrote:
 On Tue, Apr 17, 2012 at 5:59 AM, Matthew Brettmatthew.br...@gmail.com  
 wrote:
 Hi,

 On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphanttra...@continuum.io  
 wrote:
 Mark and I will have conversations about NumPy while he is in Austin.   
 There are many other active stake-holders whose opinions and views are 
 essential for major changes.Mark and I are working on other things 
 besides just NumPy and all NumPy changes will be discussed on list and 
 require consensus or super-majority for NumPy itself to change. I'm 
 not sure if that helps.   Is there more we can do?

 As you might have heard me say before, my concern is that it has not
 been easy to have good discussions on this list.   I think the problem
 has been that is has not been clear what the culture was, and how
 decisions got made, and that had led to some uncomfortable and
 unhelpful discussions.  My plea would be for you as BDF$N to strongly
 encourage on-list discussions and discourage off-list discussions as
 far as possible, and to help us make the difficult public effort to
 bash out the arguments to clarity and consensus.  I know that's a big
 ask.

 Hi Matthew,

 As you know, I agree with everything you just said :-). So in interest
 of transparency, I should add: I have been in touch with Travis some
 off-list, and the main topic has been how to proceed in a way that
 let's us achieve public consensus.

...when possible without paralysis.


 I'm glad to hear that discussion is happening, but please do have it
 on list.   If it's off list it easy for people to feel they are being
 bypassed, and that the public discussion is not important.  So, yes,
 you might get a better outcome for this specific case, but a worse
 outcome in the long term, because the list will start to feel that
 it's for signing off or voting rather than discussion, and that - I
 feel sure - would lead to worse decisions.

I think you are over-stating the case a bit.  Taking what you say 
literally, one might conclude that numpy people should never meet and 
chat, or phone each other up and chat.  But such small conversations are 
an important extension and facilitator of individual thinking. Major 
decisions do need to get hashed out publicly, but mailing list 
discussions are only one part of the thinking and decision process.

Eric


 The other issue is that there's a reason you are having the discussion
 off-list - which is that it was getting difficult on-list.  But -
 again - a personal view - that really has to be addressed directly by
 setting out the rules of engagement and modeling the kind of
 discussion we want to have.

 Cheers,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-17 Thread Fernando Perez
On Tue, Apr 17, 2012 at 11:40 AM, Matthew Brett matthew.br...@gmail.com wrote:
 I'm glad to hear that discussion is happening, but please do have it
 on list.   If it's off list it easy for people to feel they are being
 bypassed, and that the public discussion is not important.

I'm afraid I have to disagree: you seem to be proposing an absolute,
'zero-tolerance'-style policy against any off-list discussion.  The
only thing ZT policies achieve is to remove common sense and human
judgement from a process, invariably causing more harm than they do
good, no matter how well intentioned.

There are perfectly reasonable cases where a quick phone call may be a
more effective and sensible way to work than an on-list discussion.
The question isn't whether someone, somewhere, had an off-list
discussion or not; it's whether *the main decision making process* is
being handled transparently or not.

I trust that Nathaniel and Travis had a sensible reason to speak
off-list; as long as it appears clear that the *decisions about numpy*
are being made via public discussion with room for all necessary input
and confrontation of disparate viewpoints, I don't care what they talk
about in private.

In IPython, I am constantly fielding private emails that I very often
refer to the list because they make more sense there, but I also have
off-list discussions when I consider that to be the right thing to do.
 And I certainly hope nobody ever asks me to *never* have an off-list
discussion.  I try very hard to ensure that the direction of the
project is very transparent, with redundant points (people) of access
to critical resources and a good vetting of key decisions with public
input (e.g. our first IPEP at
https://github.com/ipython/ipython/issues/1611).  If I am failing at
that, I hope people will call me out *on that point*, but not on
whether I ever pick up the phone or email to talk about IPython
off-list.

Let's try to trust for one minute that the actual decisions will be
made here with solid debate and project-wide input, and seek change
only if we have evidence that this isn't happening (not evidence of a
meta-problem that isn't a problem here).

Best,

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-17 Thread Matthew Brett
Hi,

On Tue, Apr 17, 2012 at 12:04 PM, Fernando Perez fperez@gmail.com wrote:
 On Tue, Apr 17, 2012 at 11:40 AM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 I'm glad to hear that discussion is happening, but please do have it
 on list.   If it's off list it easy for people to feel they are being
 bypassed, and that the public discussion is not important.

 I'm afraid I have to disagree: you seem to be proposing an absolute,
 'zero-tolerance'-style policy against any off-list discussion.  The
 only thing ZT policies achieve is to remove common sense and human
 judgement from a process, invariably causing more harm than they do
 good, no matter how well intentioned.

Right - but that would be an absurd overstatement of what I said.
There's no point in addressing something I didn't say and no sensible
person would think.   Indeed, it makes the discussion harder.

It's just exhausting to have to keep stating the obvious.  Of course
discussions happen off-list.  Of course sometimes that has to happen.
Of course that can be a better and quicker way of having discussions.

However, in this case the

 Let's try to trust for one minute that the actual decisions will be
 made here with solid debate and project-wide input, and seek change
 only if we have evidence that this isn't happening (not evidence of a
 meta-problem that isn't a problem here).

meta-problem that is a real problem is that we've shown ourselves that
we are not currently good at having discussions on list.

There are clearly reasons for that, and also clearly, they can be
addressed.   The particular point I am making is neither silly nor
extreme nor vapid.  It is simply that, in order to make discussion
work better on the list, it is in my view better to make an explicit
effort to make the discussions - explicit.

Yours in Bay Area opposition,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-17 Thread Fernando Perez
On Tue, Apr 17, 2012 at 12:10 PM, Matthew Brett matthew.br...@gmail.com wrote:
 Right - but that would be an absurd overstatement of what I said.
 There's no point in addressing something I didn't say and no sensible
 person would think.   Indeed, it makes the discussion harder.

Well, in that case neither Eric Firing nor I are 'sensible persons',
since that's how we both understood what you said (Eric's email
appeared to me as a more concise/better phrased version of the same
points I was making). You said:


I'm glad to hear that discussion is happening, but please do have it
on list.   If it's off list it easy for people to feel they are being
bypassed, and that the public discussion is not important.


I don't think it's an 'absurd overstatement' to interpret that as
don't have discussions off-list, but hey, it may just be me :)

 meta-problem that is a real problem is that we've shown ourselves that
 we are not currently good at having discussions on list.

Oh, I know that did happen in the past regarding this very topic (the
big NA mess last summer); what I meant was to try and trust that *this
time around* things might be already moving in a better direction,
which it seems to me they are.  It seems to me that everyone is
genuinely trying to tackle the discussion/consensus questions head-on
right on the list, and that's why I proposed waiting to see if there
were really any problems before asking Nathaniel not to have any
discussion off-list (esp. since we have no evidence that what they
talked about had any impact on any decisions bypassing the open
forum).

Best,

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-17 Thread Matthew Brett
On Tue, Apr 17, 2012 at 12:32 PM, Fernando Perez fperez@gmail.com wrote:
 On Tue, Apr 17, 2012 at 12:10 PM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 Right - but that would be an absurd overstatement of what I said.
 There's no point in addressing something I didn't say and no sensible
 person would think.   Indeed, it makes the discussion harder.

 Well, in that case neither Eric Firing nor I are 'sensible persons',
 since that's how we both understood what you said (Eric's email
 appeared to me as a more concise/better phrased version of the same
 points I was making). You said:

 
 I'm glad to hear that discussion is happening, but please do have it
 on list.   If it's off list it easy for people to feel they are being
 bypassed, and that the public discussion is not important.
 

 I don't think it's an 'absurd overstatement' to interpret that as
 don't have discussions off-list, but hey, it may just be me :)

The absurd over-statement is the following:

 I'm afraid I have to disagree: you seem to be proposing an absolute,
'zero-tolerance'-style policy against any off-list discussion.  

 meta-problem that is a real problem is that we've shown ourselves that
 we are not currently good at having discussions on list.

 Oh, I know that did happen in the past regarding this very topic (the
 big NA mess last summer); what I meant was to try and trust that *this
 time around* things might be already moving in a better direction,
 which it seems to me they are.  It seems to me that everyone is
 genuinely trying to tackle the discussion/consensus questions head-on
 right on the list, and that's why I proposed waiting to see if there
 were really any problems before asking Nathaniel not to have any
 discussion off-list (esp. since we have no evidence that what they
 talked about had any impact on any decisions bypassing the open
 forum).

The question - which seems to me  to be sensible rational and
important - is how to get better at on-list discussion, and whether
taking this particular discussion mainly off-list is good or bad in
that respect.

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-17 Thread Tim Cera
I have never found mailing lists good places for discussion and consensus.
 I think the format itself does not lend itself to involvement, carefully
considered (or the ability to change) positions, or voting since all of it
can be so easily lost within all of the quoting, the back and forth, people
walking away,,,etc.  And you also want involvement from people who don't
have x hours to craft a finely worded, politically correct, and detailed
response.  I am not advocating this particular system, but something like
http://meta.programmers.stackexchange.com/ would be a better platform for
building to a decision when there are many choices to be made.

Now about ma, NA, missing...

I am just an engineer working in water resources and I had lots of
difficulty reading the NEP (so slepy) so I will be the first to admit
that I probably have something wrong.  Just for reference (since I missed
it the first time around) Nathaniel posted this page at
https://github.com/njsmith/numpy/wiki/NA-discussion-status

I think that I could adapt to everything that is discussed in the NEP, but
I do have some comments about things that puzzled me.  I don't need
answers, but if I am puzzled maybe others are also.

First - 'maskna=True'?
Tested on development version of numpy...
 a = np.arange(10, maskna = True)
 a[:2] = np.NA
 a
array([NA, NA, 2, 3, 4, 5, 6, 7, 8, 9])

Why do I have to specify 'maskna = True'?  If NA and ndarray are intended
to be combined in some way, then I don't think that I need this.  During
development, I understand, but the NEP shouldn't have it.  Heck, even if
you keep NA and ndarrays separate when someone tries to set a ndarray
element with np.NA, instead of a ValueError convert to an NA array.  I say
that very casually as if I know how to do it.  I do have a proof, but the
margin is too small to include it.  :-)

I am torn about 'skipna=True'.  I think I understand the desire for
explicit behavior, but I suspect that every operation that I would use a NA
array for, would require 'skipna=True'.  Actually, I don't use that many
reducing functions, so maybe not a big deal.  Regardless of the skipna
setting, a related idea that could be useful for reducing functions is
to set an 'includesna' attribute with the returned scalar value.

The view() didn't work as described in the NEP, where np.NA isn't
propagated back to the original array.  This could be because the NEP
references a 'missingdata' work in progress branch and I don't know what
has been merged.  I can force the NEP described behavior if I set
'd.flags.ownmaskna=True'.
 d = a.view()
 d
 array([NA, NA, 2, 3, 4, 5, 6, 7, 8, 9])
 d[0] = 4
 a
 array([4, NA, 2, 3, 4, 5, 6, 7, 8, 9])
 d
 array([4, NA, 2, 3, 4, 5, 6, 7, 8, 9])
 d[6] = np.NA
 d
 array([4, NA, 2, 3, 4, 5, NA, 7, 8, 9])
 a
 array([4, NA, 2, 3, 4, 5, NA, 7, 8, 9])

In the NEP 'Accessing a Boolean Mask' section there is a comment about...
actually I don't understand this section at all.  Especially about a
boolean byle level mask?  Why would it need to be a byte level mask in
order to be viewed?  The logic also of mask = True or False, that can be
easily handled by using a better name for the flag.  'mask = True' means
that the value is masked (missing), where if 'exposed = True' is used that
means the value is not masked (not missing).

The biggest question mark to me is that 'a[0] = np.NA' is destructive and
(using numpy.ma) 'a.mask[0] = True' is not.  Is that a big deal?  I am
trying to think back on all of my 'ma' code and try to remember if I
masked, then unmasked values and I don't recall any time that I did that.
 Of course my use cases are constrained to what I have done in the past.
 It feels like a bad idea, for the sake of saving the memory for the mask
bits.

Now, the amazing thing is that understanding so little, doing even less of
the work, I get to vote. Sounds like America!

I would really like to see NA in the wild, and I think that I can adapt my
ma code to it, so +1.  If it has to wait until 1.8, +1.  If it has to wait
until 1.9, +1.

Kindest regards,
Tim
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Ralf Gommers
On Tue, Apr 17, 2012 at 12:06 AM, Travis Oliphant tra...@continuum.iowrote:

 There is an issue with the NumPy 1.7 release that we all need to
 understand.   Doesn't including the missing-data attributes in the NumPy
 structure in a released version of NumPy basically commit to including
 those attributes in NumPy 1.8?


We clearly labeled NA as experimental, so some changes are to be expected.
But not complete removal - so yes, if we release them they should stay in
some form.


  I'm not comfortable with that, is everyone else?One possibility is to
 move those attributes to a C-level sub-class of NumPy.


That's the first time I've heard this. Until now, we have talked a lot
about adding bitmasks and API changes, not about complete removal. My
assumption was that the experimental label was enough. From Nathaniel's
reaction I gathered the same. It looks like too many conversations on this
topic are happening off-list.

Ralf


 I have heard from a few people that they are not excited by the growth of
 the NumPy data-structure by the 3 pointers needed to hold the masked-array
 storage.   This is especially true when there is talk to potentially add
 additional attributes to the NumPy array (for labels and other
 meta-information).  If you are willing to let us know how you feel
 about this, please speak up.

 Mark Wiebe will be in Austin for about 3 months.  He and I will be hashing
 some of this out in the first week or two.We will present any proposal
 and ask questions to this list before acting. We will be using some
 phone calls and face-to-face communications to increase the bandwidth and
 speed of the conversations (not to exclude anyone).If you would like to
 be part of the in-person discussions let me know -- or just make your views
 known here --- they will be taken seriously.

 The goal is consensus for any major change in NumPy.   If we can't get
 consensus, then we vote on this list and use a super-majority.   If we
 can't get a super-majority, then except in rare circumstances we can't move
 forward.Heavy users of NumPy get higher voting privileges.

 My perspective is that we don't have consensus on the current additions to
 the NumPy data-structure to have the current additional attributes on the
 NumPy data-structure be included for long-term release.

 Best,

 -Travis





 On Mar 25, 2012, at 6:27 PM, Charles R Harris wrote:



 On Sun, Mar 25, 2012 at 3:14 PM, Ralf Gommers ralf.gomm...@googlemail.com
  wrote:



 On Sat, Mar 24, 2012 at 10:13 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:

 Hi All,

 There several problems with numpy master that need to be fixed before a
 release can be considered.

1. Datetime on windows with mingw.
2. Bus error on SPARC, ticket #2076.
3. NA and real/complex views of complex arrays.

 Number 1 has been proved to be particularly difficult, any help or
 suggestions for that would be much appreciated. The current work has been
 going in pull request 214 https://github.com/numpy/numpy/pull/214.

 This isn't to say that there aren't a ton of other things that need
 fixing or that we can skip out on the current stack of pull requests, but I
 think it is impossible to consider a release while those three problems are
 outstanding.

 Why do you consider (2) a blocker? Not saying it's not important, but
 there are eight other open tickets with segfaults. Some are more esoteric
 than other, but I don't see why for example #1713 and #1808 are less
 important than this one.

 #1522 provides a patch that fixes a segfault by the way, could use a
 review.


 I wasn't aware of the other segfaults, I'd like to get them all fixed...
 The list was meant to elicit additions.

 I don't know where the missed floating point errors come from, but they
 are somewhat dependent on the compiler doing the right thing and hardware
 support. I'd welcome any insight into why we get them on SPARC (underflow)
 and Windows (overflow). The windows buildbot doesn't seem to be updating
 correctly since it is still missing the combinations method that is now
 part of the test module.

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Fernando Perez
On Mon, Apr 16, 2012 at 3:21 PM, Ralf Gommers
ralf.gomm...@googlemail.com wrote:
 That's the first time I've heard this. Until now, we have talked a lot about
 adding bitmasks and API changes, not about complete removal. My assumption
 was that the experimental label was enough. From Nathaniel's reaction I
 gathered the same. It looks like too many conversations on this topic are
 happening off-list.

My impression was that Travis was just suggesting that as an option
here for discussion, not presenting it as something discussed
elsewhere.  I read Travis' email precisely as restarting the
discussion for consideration of the issues in full public view (+
calls/skype open to anyone interested for bandwidth purposes), so in
this case I don't think there's any background off-list to worry
about.  At least that's how I read it...

Cheers,

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Travis Oliphant
No off list discussions have been happening material to this point.   I am 
basically stating my view for the first time.  I have delayed because I realize 
it is not a pleasant view and I was hoping I could end up resolving it 
favorably.

But,  it needs to be discussed before 1.7 is released.  

--
Travis Oliphant
(on a mobile)
512-826-7480


On Apr 16, 2012, at 5:27 PM, Fernando Perez fperez@gmail.com wrote:

 On Mon, Apr 16, 2012 at 3:21 PM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
 That's the first time I've heard this. Until now, we have talked a lot about
 adding bitmasks and API changes, not about complete removal. My assumption
 was that the experimental label was enough. From Nathaniel's reaction I
 gathered the same. It looks like too many conversations on this topic are
 happening off-list.
 
 My impression was that Travis was just suggesting that as an option
 here for discussion, not presenting it as something discussed
 elsewhere.  I read Travis' email precisely as restarting the
 discussion for consideration of the issues in full public view (+
 calls/skype open to anyone interested for bandwidth purposes), so in
 this case I don't think there's any background off-list to worry
 about.  At least that's how I read it...
 
 Cheers,
 
 f
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Charles R Harris
On Mon, Apr 16, 2012 at 4:33 PM, Travis Oliphant tra...@continuum.iowrote:

 No off list discussions have been happening material to this point.   I am
 basically stating my view for the first time.  I have delayed because I
 realize it is not a pleasant view and I was hoping I could end up resolving
 it favorably.

 But,  it needs to be discussed before 1.7 is released.


What is the problem with three extra pointers?

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Ralf Gommers
On Tue, Apr 17, 2012 at 12:27 AM, Fernando Perez fperez@gmail.comwrote:

 On Mon, Apr 16, 2012 at 3:21 PM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
  That's the first time I've heard this. Until now, we have talked a lot
 about
  adding bitmasks and API changes, not about complete removal. My
 assumption
  was that the experimental label was enough. From Nathaniel's reaction I
  gathered the same. It looks like too many conversations on this topic are
  happening off-list.

 My impression was that Travis was just suggesting that as an option
 here for discussion, not presenting it as something discussed
 elsewhere.


From I have heard from a few people that they are not excited  I
deduce it was discussed to some extent.

I read Travis' email precisely as restarting the
 discussion for consideration of the issues in full public view


It wasn't restating anything, it's completely opposite to the part that I
thought we did reach consensus on (*not* backing out changes). I stated as
much when first discussing a 1.7.0 in December,
http://thread.gmane.org/gmane.comp.python.numeric.general/47022/focus=47027,
with no one disagreeing.

It's perfectly fine to reconsider any previous decisions/discussions of
course.

However, I do now draw the conclusion that it's best to wait for this issue
to be resolved before considering a new release. I had been working on
closing tickets and cleaning up loose ends for 1.7.0, and pinging others to
do the same. I guess I'll stop doing that for now, until the renewed NA
debate has been settled.

If there are bug fixes that are important (like the Debian segfaults with
Python debug builds), we can do a 1.6.2 release.

Ralf

(+
 calls/skype open to anyone interested for bandwidth purposes), so in
 this case I don't think there's any background off-list to worry
 about.  At least that's how I read it...

 Cheers,

 f
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Travis Oliphant
The comments I have heard have been from people who haven't wanted to make them 
on this list.   I wish they would, but I understand that not everyone wants to 
be drawn into a long discussion.They have not been discussions.

My bias is to just move forward with what is there.   After a week or two of 
discussion, I expect that we will resolve this one way or another.  The result 
be to just move forward as previously planned.  However, that might not be the 
best move forward either.   These are significant changes and they do impact 
users.  We need to understand those implications and take very seriously any 
concerns from users.

There is time to look at this carefully.   We need to take the time.   I am 
really posting so that the discussions Mark and I have this week (I haven't 
seen Mark since PyCon) can be productive with as many other people 
participating as possible.

--
Travis Oliphant
(on a mobile)
512-826-7480


On Apr 16, 2012, at 6:01 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote:

 
 
 On Tue, Apr 17, 2012 at 12:27 AM, Fernando Perez fperez@gmail.com wrote:
 On Mon, Apr 16, 2012 at 3:21 PM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
  That's the first time I've heard this. Until now, we have talked a lot about
  adding bitmasks and API changes, not about complete removal. My assumption
  was that the experimental label was enough. From Nathaniel's reaction I
  gathered the same. It looks like too many conversations on this topic are
  happening off-list.
 
 My impression was that Travis was just suggesting that as an option
 here for discussion, not presenting it as something discussed
 elsewhere.  
 
 From I have heard from a few people that they are not excited  I deduce 
 it was discussed to some extent.
 
 I read Travis' email precisely as restarting the
 discussion for consideration of the issues in full public view
 
 It wasn't restating anything, it's completely opposite to the part that I 
 thought we did reach consensus on (*not* backing out changes). I stated as 
 much when first discussing a 1.7.0 in December, 
 http://thread.gmane.org/gmane.comp.python.numeric.general/47022/focus=47027, 
 with no one disagreeing.
 
 It's perfectly fine to reconsider any previous decisions/discussions of 
 course. 
 
 However, I do now draw the conclusion that it's best to wait for this issue 
 to be resolved before considering a new release. I had been working on 
 closing tickets and cleaning up loose ends for 1.7.0, and pinging others to 
 do the same. I guess I'll stop doing that for now, until the renewed NA 
 debate has been settled.
 
 If there are bug fixes that are important (like the Debian segfaults with 
 Python debug builds), we can do a 1.6.2 release.
 
 Ralf
 
 (+
 calls/skype open to anyone interested for bandwidth purposes), so in
 this case I don't think there's any background off-list to worry
 about.  At least that's how I read it...
 
 Cheers,
 
 f
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Charles R Harris
On Mon, Apr 16, 2012 at 5:17 PM, Travis Oliphant tra...@continuum.iowrote:

 The comments I have heard have been from people who haven't wanted to make
 them on this list.   I wish they would, but I understand that not everyone
 wants to be drawn into a long discussion.They have not been discussions.

 My bias is to just move forward with what is there.   After a week or two
 of discussion, I expect that we will resolve this one way or another.  The
 result be to just move forward as previously planned.  However, that might
 not be the best move forward either.   These are significant changes and
 they do impact users.  We need to understand those implications and take
 very seriously any concerns from users.

 There is time to look at this carefully.   We need to take the time.   I
 am really posting so that the discussions Mark and I have this week (I
 haven't seen Mark since PyCon) can be productive with as many other people
 participating as possible.


I would suggest the you and Mark have a good talk first, then report here
with some specifics that you think need discussion, along with specifics
from the unnamed sources. The somewhat vague some say doesn't help much
and in the absence of specifics the discussion is likely to proceed along
the same old lines if it happens at all. Meanwhile there is a disturbance
in the force that makes us all uneasy.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Matthew Brett
Hi,

On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant tra...@continuum.io wrote:

 I have heard from a few people that they are not excited by the growth of
 the NumPy data-structure by the 3 pointers needed to hold the masked-array
 storage.   This is especially true when there is talk to potentially add
 additional attributes to the NumPy array (for labels and other
 meta-information).      If you are willing to let us know how you feel about
 this, please speak up.

I guess there are two questions here

1) Will something like the current version of masked arrays have a
long term future in numpy, regardless of eventual API? Most likely
answer - yes?
2) Will likely changes to the masked array API make any difference to
the number of extra pointers?  Likely answer no?

Is that right?

I have the impression that the masked array API discussion still has
not come out fully into the unforgiving light of discussion day, but
if the answer to 2) is No, then I suppose the API discussion is not
relevant to the 3 pointers change.

See y'all,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Matthew Brett
Hi,

On Mon, Apr 16, 2012 at 6:03 PM, Matthew Brett matthew.br...@gmail.com wrote:
 Hi,

 On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant tra...@continuum.io wrote:

 I have heard from a few people that they are not excited by the growth of
 the NumPy data-structure by the 3 pointers needed to hold the masked-array
 storage.   This is especially true when there is talk to potentially add
 additional attributes to the NumPy array (for labels and other
 meta-information).      If you are willing to let us know how you feel about
 this, please speak up.

 I guess there are two questions here

 1) Will something like the current version of masked arrays have a
 long term future in numpy, regardless of eventual API? Most likely
 answer - yes?
 2) Will likely changes to the masked array API make any difference to
 the number of extra pointers?  Likely answer no?

 Is that right?

 I have the impression that the masked array API discussion still has
 not come out fully into the unforgiving light of discussion day, but
 if the answer to 2) is No, then I suppose the API discussion is not
 relevant to the 3 pointers change.

Sorry, if the answers to 1 and 2 are Yes and No then the API
discussion may not be relevant.

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Matthew Brett
Hi,

On Mon, Apr 16, 2012 at 7:46 PM, Travis Oliphant tra...@continuum.io wrote:

 On Apr 16, 2012, at 8:03 PM, Matthew Brett wrote:

 Hi,

 On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant tra...@continuum.io wrote:

 I have heard from a few people that they are not excited by the growth of
 the NumPy data-structure by the 3 pointers needed to hold the masked-array
 storage.   This is especially true when there is talk to potentially add
 additional attributes to the NumPy array (for labels and other
 meta-information).      If you are willing to let us know how you feel about
 this, please speak up.

 I guess there are two questions here

 1) Will something like the current version of masked arrays have a
 long term future in numpy, regardless of eventual API? Most likely
 answer - yes?

 I think the answer to this is yes, but it could be as a feature-filled 
 sub-class (like the current numpy.ma, except in C).

I'd love to hear that argument fleshed out in more detail - do you have time?

 2) Will likely changes to the masked array API make any difference to
 the number of extra pointers?  Likely answer no?

 Is that right?

 The answer to this is very likely no on the Python side.  But, on the C-side, 
 their could be some differences (i.e. are masked arrays a sub-class of the 
 ndarray or not).


 I have the impression that the masked array API discussion still has
 not come out fully into the unforgiving light of discussion day, but
 if the answer to 2) is No, then I suppose the API discussion is not
 relevant to the 3 pointers change.

 You are correct that the API discussion is separate from this one.     
 Overall,  I was surprised at how fervently people would oppose ABI changes.   
 As has been pointed out, NumPy and Numeric before it were not really designed 
 to prevent having to recompile when changes were made.   I'm still not sure 
 that a better overall solution is not to promote better availability of 
 downstream binary packages than excessively worry about ABI changes in NumPy. 
    But, that is the current climate.

The objectors object to any binary ABI change, but not specifically
three pointers rather than two or one?

Is their point then about ABI breakage?  Because that seems like a
different point again.

Or is it possible that they are in fact worried about the masked array API?

 Mark and I will talk about this long and hard.  Mark has ideas about where he 
 wants to see NumPy go, but I don't think we have fully accounted for where 
 NumPy and its user base *is* and there may be better ways to approach this 
 evolution.    If others are interested in the outcome of the discussion 
 please speak up (either on the list or privately) and we will make sure your 
 views get heard and accounted for.

I started writing something about this but I guess you'd know what I'd
write, so I only humbly ask that you consider whether it might be
doing real damage to allow substantial discussion that is not
documented or argued out in public.

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Travis Oliphant
Ralf, 

I wouldn't change your plans just yet for NumPy 1.7.   With Mark available full 
time for the next few weeks, I think we will be able to make rapid progress on 
whatever is decided -- in fact if people are available to help but just need 
resources let me know off list.  

I just want to make sure that the process for making significant changes to 
NumPy does not dis-enfranchise any voice.   Like bug-reports, and 
feature-requests, complaints are food to a project, just like usage is oxygen.  
   In my view, we should take any concern that is raised from the perspective 
of NumPy is guilty until proven innocent.  This takes some intentional 
effort.   I have found that because of how much work it takes to design and 
implement software, my natural perspective is to be defensive, but I have 
always appreciated the outcome when all view-points are considered seriously 
and addressed respectfully.  

Best regards,

-Travis

 


On Apr 16, 2012, at 6:01 PM, Ralf Gommers wrote:

 
 
 On Tue, Apr 17, 2012 at 12:27 AM, Fernando Perez fperez@gmail.com wrote:
 On Mon, Apr 16, 2012 at 3:21 PM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
  That's the first time I've heard this. Until now, we have talked a lot about
  adding bitmasks and API changes, not about complete removal. My assumption
  was that the experimental label was enough. From Nathaniel's reaction I
  gathered the same. It looks like too many conversations on this topic are
  happening off-list.
 
 My impression was that Travis was just suggesting that as an option
 here for discussion, not presenting it as something discussed
 elsewhere.  
 
 From I have heard from a few people that they are not excited  I deduce 
 it was discussed to some extent.
 
 I read Travis' email precisely as restarting the
 discussion for consideration of the issues in full public view
 
 It wasn't restating anything, it's completely opposite to the part that I 
 thought we did reach consensus on (*not* backing out changes). I stated as 
 much when first discussing a 1.7.0 in December, 
 http://thread.gmane.org/gmane.comp.python.numeric.general/47022/focus=47027, 
 with no one disagreeing.
 
 It's perfectly fine to reconsider any previous decisions/discussions of 
 course. 
 
 However, I do now draw the conclusion that it's best to wait for this issue 
 to be resolved before considering a new release. I had been working on 
 closing tickets and cleaning up loose ends for 1.7.0, and pinging others to 
 do the same. I guess I'll stop doing that for now, until the renewed NA 
 debate has been settled.
 
 If there are bug fixes that are important (like the Debian segfaults with 
 Python debug builds), we can do a 1.6.2 release.
 
 Ralf
 
 (+
 calls/skype open to anyone interested for bandwidth purposes), so in
 this case I don't think there's any background off-list to worry
 about.  At least that's how I read it...
 
 Cheers,
 
 f
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Travis Oliphant
 
 I think the answer to this is yes, but it could be as a feature-filled 
 sub-class (like the current numpy.ma, except in C).
 
 I'd love to hear that argument fleshed out in more detail - do you have time?


My proposal here is to basically take the current github NumPy data-structure 
and make this a sub-type (in C) of the NumPy 1.6 data-structure which is 
unchanged in NumPy 1.7.   

This would not require removing code but would require another PyTypeObject and 
associated structures.  I expect Mark could do this work in 2-4 weeks.   We 
also have other developers who could help in order to get the sub-type in NumPy 
1.7. What kind of details would you like to see? 

In this way, the masked-array approach to missing data could be pursued by 
those who prefer that approach without affecting any other users of numpy 
arrays (and the numpy.ma sub-class could be deprecated). I would also like 
to add missing-data dtypes (ideally before NumPy 1.7, but it is not a 
requirement of release). 

I just think we need more data and uses and this would provide a way to get 
that without making a forced decision one way or another. 

 
 2) Will likely changes to the masked array API make any difference to
 the number of extra pointers?  Likely answer no?
 
 Is that right?
 
 The answer to this is very likely no on the Python side.  But, on the 
 C-side, their could be some differences (i.e. are masked arrays a sub-class 
 of the ndarray or not).
 
 
 I have the impression that the masked array API discussion still has
 not come out fully into the unforgiving light of discussion day, but
 if the answer to 2) is No, then I suppose the API discussion is not
 relevant to the 3 pointers change.
 
 You are correct that the API discussion is separate from this one. 
 Overall,  I was surprised at how fervently people would oppose ABI changes.  
  As has been pointed out, NumPy and Numeric before it were not really 
 designed to prevent having to recompile when changes were made.   I'm still 
 not sure that a better overall solution is not to promote better 
 availability of downstream binary packages than excessively worry about ABI 
 changes in NumPy.But, that is the current climate.
 
 The objectors object to any binary ABI change, but not specifically
 three pointers rather than two or one?

Adding pointers is not really an ABI change (but removing them after they were 
there would be...)  It's really just the addition of data to the NumPy array 
structure that they aren't going to use.  Most of the time it would not be a 
real problem (the number of use-cases where you have a lot of small NumPy 
arrays is small), but when it is a problem it is very annoying. 

 
 Is their point then about ABI breakage?  Because that seems like a
 different point again.

Yes, it's not that. 

 
 Or is it possible that they are in fact worried about the masked array API?

I don't think most people whose opinion would be helpful are really tuned in to 
the discussion at this point.  I think they just want us to come up with an 
answer and then move forward.But, they will judge us based on the solution 
we come up with. 

 
 Mark and I will talk about this long and hard.  Mark has ideas about where 
 he wants to see NumPy go, but I don't think we have fully accounted for 
 where NumPy and its user base *is* and there may be better ways to approach 
 this evolution.If others are interested in the outcome of the discussion 
 please speak up (either on the list or privately) and we will make sure your 
 views get heard and accounted for.
 
 I started writing something about this but I guess you'd know what I'd
 write, so I only humbly ask that you consider whether it might be
 doing real damage to allow substantial discussion that is not
 documented or argued out in public.

It will be documented and argued in public. We are just going to have one 
off-list conversation to try and speed up the process.You make a valid 
point, and I appreciate the perspective. Please speak up again after 
hearing the report if something is not clear.   I don't want this to even have 
the appearance of a back-room deal. 

Mark and I will have conversations about NumPy while he is in Austin.   There 
are many other active stake-holders whose opinions and views are essential for 
major changes.Mark and I are working on other things besides just NumPy and 
all NumPy changes will be discussed on list and require consensus or 
super-majority for NumPy itself to change. I'm not sure if that helps.   Is 
there more we can do? 

Thanks, 

-Travis



 
 See you,
 
 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Charles R Harris
On Mon, Apr 16, 2012 at 8:46 PM, Travis Oliphant tra...@continuum.iowrote:


 On Apr 16, 2012, at 8:03 PM, Matthew Brett wrote:

  Hi,
 
  On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant tra...@continuum.io
 wrote:
 
  I have heard from a few people that they are not excited by the growth
 of
  the NumPy data-structure by the 3 pointers needed to hold the
 masked-array
  storage.   This is especially true when there is talk to potentially add
  additional attributes to the NumPy array (for labels and other
  meta-information).  If you are willing to let us know how you feel
 about
  this, please speak up.
 
  I guess there are two questions here
 
  1) Will something like the current version of masked arrays have a
  long term future in numpy, regardless of eventual API? Most likely
  answer - yes?

 I think the answer to this is yes, but it could be as a feature-filled
 sub-class (like the current numpy.ma, except in C).


I think making numpy.ma a subclass of ndarray has caused all sorts of
trouble. It doesn't satisfy 'is a', rather it tries to use inheritance from
ndarray for implementation of various parts. The upshot is that almost
everything has to be overridden, so it didn't buy much.



  2) Will likely changes to the masked array API make any difference to
  the number of extra pointers?  Likely answer no?
 
  Is that right?

 The answer to this is very likely no on the Python side.  But, on the
 C-side, their could be some differences (i.e. are masked arrays a sub-class
 of the ndarray or not).

 
  I have the impression that the masked array API discussion still has
  not come out fully into the unforgiving light of discussion day, but
  if the answer to 2) is No, then I suppose the API discussion is not
  relevant to the 3 pointers change.

 You are correct that the API discussion is separate from this one.
 Overall,  I was surprised at how fervently people would oppose ABI changes.
   As has been pointed out, NumPy and Numeric before it were not really
 designed to prevent having to recompile when changes were made.   I'm still
 not sure that a better overall solution is not to promote better
 availability of downstream binary packages than excessively worry about ABI
 changes in NumPy.But, that is the current climate.

 In that climate, my concern is that we haven't finalized the API but are
 rapidly cementing the *structure* of NumPy arrays into a modified form that
 has real downstream implications.   Two other people I have talked to share
 this concern (nobody who has posted on this list before but who are heavy
 users of NumPy).I may have missed the threads where it was discussed,
 but have these structure changes and their implications been fully
 discussed?   Is there anyone else who is concerned about adding 3 more
 pointers (12 bytes or 24 bytes) to the NumPy structure?

 As Chuck points out, 3 more pointers is not necessarily that big of a deal
 if you are talking about a large array (though for small arrays it could
 matter).   But, I personally know of half-written NEPs that propose to add
 more pointers to the NumPy array:

* to allow meta-information to be attached to a NumPy array
* to allow labels to be attached to a NumPy array (ala data-array)
* to allow multiple chunks for an array.

 Are people O.K. with 5 or 6 more pointers on every NumPy array?We
 could also think about adding just one more pointer to a new enhanced
 structure that contains multiple enhancements to the NumPy array.


Yes, this whole thing could get out of hand with too many extras. One of
the things you could discuss with Mark is how to deal with this, or limit
the modifications. At some point the ndarray class could become cumbersome,
complicated, and difficult to maintain. We need to be careful that it
doesn't go that way. I'd like to keep it as simple as possible, the
question is what is fundamental. The main long term advantage of having
masks part of the base is the possibility of adapted loops in ufuncs, which
would give the advantage of speed. But that is just how it looks from where
I stand, no doubt others have different priorities.

But, this whole line of discussion sounds a lot like a true sub-class of
 the NumPy array at the C-level.It has the benefit that only people that
 use the features of the sub-class have to worry about using the extra space.

 Mark and I will talk about this long and hard.  Mark has ideas about where
 he wants to see NumPy go, but I don't think we have fully accounted for
 where NumPy and its user base *is* and there may be better ways to approach
 this evolution.If others are interested in the outcome of the
 discussion please speak up (either on the list or privately) and we will
 make sure your views get heard and accounted for.


Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Travis Oliphant

On Apr 16, 2012, at 11:01 PM, Charles R Harris wrote:

 
 
 On Mon, Apr 16, 2012 at 8:46 PM, Travis Oliphant tra...@continuum.io wrote:
 
 On Apr 16, 2012, at 8:03 PM, Matthew Brett wrote:
 
  Hi,
 
  On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant tra...@continuum.io 
  wrote:
 
  I have heard from a few people that they are not excited by the growth of
  the NumPy data-structure by the 3 pointers needed to hold the masked-array
  storage.   This is especially true when there is talk to potentially add
  additional attributes to the NumPy array (for labels and other
  meta-information).  If you are willing to let us know how you feel 
  about
  this, please speak up.
 
  I guess there are two questions here
 
  1) Will something like the current version of masked arrays have a
  long term future in numpy, regardless of eventual API? Most likely
  answer - yes?
 
 I think the answer to this is yes, but it could be as a feature-filled 
 sub-class (like the current numpy.ma, except in C).
 
 I think making numpy.ma a subclass of ndarray has caused all sorts of 
 trouble. It doesn't satisfy 'is a', rather it tries to use inheritance from 
 ndarray for implementation of various parts. The upshot is that almost 
 everything has to be overridden, so it didn't buy much.

This is a valid point.   One could create a new object that is binary 
compatible with the NumPy Array but not really a sub-class but provides the 
array interface.We could keep Mark's modifications to the array interface 
as well so that it can communicate a mask. 

-Travis




  
 
  2) Will likely changes to the masked array API make any difference to
  the number of extra pointers?  Likely answer no?
 
  Is that right?
 
 The answer to this is very likely no on the Python side.  But, on the C-side, 
 their could be some differences (i.e. are masked arrays a sub-class of the 
 ndarray or not).
 
 
  I have the impression that the masked array API discussion still has
  not come out fully into the unforgiving light of discussion day, but
  if the answer to 2) is No, then I suppose the API discussion is not
  relevant to the 3 pointers change.
 
 You are correct that the API discussion is separate from this one. 
 Overall,  I was surprised at how fervently people would oppose ABI changes.   
 As has been pointed out, NumPy and Numeric before it were not really designed 
 to prevent having to recompile when changes were made.   I'm still not sure 
 that a better overall solution is not to promote better availability of 
 downstream binary packages than excessively worry about ABI changes in NumPy. 
But, that is the current climate.
 
 In that climate, my concern is that we haven't finalized the API but are 
 rapidly cementing the *structure* of NumPy arrays into a modified form that 
 has real downstream implications.   Two other people I have talked to share 
 this concern (nobody who has posted on this list before but who are heavy 
 users of NumPy).I may have missed the threads where it was discussed, but 
 have these structure changes and their implications been fully discussed?   
 Is there anyone else who is concerned about adding 3 more pointers (12 bytes 
 or 24 bytes) to the NumPy structure?
 
 As Chuck points out, 3 more pointers is not necessarily that big of a deal if 
 you are talking about a large array (though for small arrays it could 
 matter).   But, I personally know of half-written NEPs that propose to add 
 more pointers to the NumPy array:
 
* to allow meta-information to be attached to a NumPy array
* to allow labels to be attached to a NumPy array (ala data-array)
* to allow multiple chunks for an array.
 
 Are people O.K. with 5 or 6 more pointers on every NumPy array?We could 
 also think about adding just one more pointer to a new enhanced structure 
 that contains multiple enhancements to the NumPy array.
 
 
 Yes, this whole thing could get out of hand with too many extras. One of the 
 things you could discuss with Mark is how to deal with this, or limit the 
 modifications. At some point the ndarray class could become cumbersome, 
 complicated, and difficult to maintain. We need to be careful that it doesn't 
 go that way. I'd like to keep it as simple as possible, the question is what 
 is fundamental. The main long term advantage of having masks part of the base 
 is the possibility of adapted loops in ufuncs, which would give the advantage 
 of speed. But that is just how it looks from where I stand, no doubt others 
 have different priorities.
 
 But, this whole line of discussion sounds a lot like a true sub-class of the 
 NumPy array at the C-level.It has the benefit that only people that use 
 the features of the sub-class have to worry about using the extra space.
 
 Mark and I will talk about this long and hard.  Mark has ideas about where he 
 wants to see NumPy go, but I don't think we have fully accounted for where 
 NumPy and its user base *is* and 

Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Matthew Brett
Hi,

On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphant tra...@continuum.io wrote:

 I think the answer to this is yes, but it could be as a feature-filled 
 sub-class (like the current numpy.ma, except in C).

 I'd love to hear that argument fleshed out in more detail - do you have time?


 My proposal here is to basically take the current github NumPy data-structure 
 and make this a sub-type (in C) of the NumPy 1.6 data-structure which is 
 unchanged in NumPy 1.7.

 This would not require removing code but would require another PyTypeObject 
 and associated structures.  I expect Mark could do this work in 2-4 weeks.   
 We also have other developers who could help in order to get the sub-type in 
 NumPy 1.7.     What kind of details would you like to see?

I was dimly thinking of the same questions that Chuck had - about how
subclassing would relate to the ufunc changes.

 I just think we need more data and uses and this would provide a way to get 
 that without making a forced decision one way or another.

Is the proposal that this would be an alternative API to numpy.ma?
Is numpy.ma not itself satisfactory as a test of these uses, because
of performance or some other reason?

 2) Will likely changes to the masked array API make any difference to
 the number of extra pointers?  Likely answer no?

 Is that right?

 The answer to this is very likely no on the Python side.  But, on the 
 C-side, their could be some differences (i.e. are masked arrays a sub-class 
 of the ndarray or not).


 I have the impression that the masked array API discussion still has
 not come out fully into the unforgiving light of discussion day, but
 if the answer to 2) is No, then I suppose the API discussion is not
 relevant to the 3 pointers change.

 You are correct that the API discussion is separate from this one.     
 Overall,  I was surprised at how fervently people would oppose ABI changes. 
   As has been pointed out, NumPy and Numeric before it were not really 
 designed to prevent having to recompile when changes were made.   I'm still 
 not sure that a better overall solution is not to promote better 
 availability of downstream binary packages than excessively worry about ABI 
 changes in NumPy.    But, that is the current climate.

 The objectors object to any binary ABI change, but not specifically
 three pointers rather than two or one?

 Adding pointers is not really an ABI change (but removing them after they 
 were there would be...)  It's really just the addition of data to the NumPy 
 array structure that they aren't going to use.  Most of the time it would not 
 be a real problem (the number of use-cases where you have a lot of small 
 NumPy arrays is small), but when it is a problem it is very annoying.


 Is their point then about ABI breakage?  Because that seems like a
 different point again.

 Yes, it's not that.


 Or is it possible that they are in fact worried about the masked array API?

 I don't think most people whose opinion would be helpful are really tuned in 
 to the discussion at this point.  I think they just want us to come up with 
 an answer and then move forward.    But, they will judge us based on the 
 solution we come up with.


 Mark and I will talk about this long and hard.  Mark has ideas about where 
 he wants to see NumPy go, but I don't think we have fully accounted for 
 where NumPy and its user base *is* and there may be better ways to approach 
 this evolution.    If others are interested in the outcome of the 
 discussion please speak up (either on the list or privately) and we will 
 make sure your views get heard and accounted for.

 I started writing something about this but I guess you'd know what I'd
 write, so I only humbly ask that you consider whether it might be
 doing real damage to allow substantial discussion that is not
 documented or argued out in public.

 It will be documented and argued in public.     We are just going to have one 
 off-list conversation to try and speed up the process.    You make a valid 
 point, and I appreciate the perspective.     Please speak up again after 
 hearing the report if something is not clear.   I don't want this to even 
 have the appearance of a back-room deal.

 Mark and I will have conversations about NumPy while he is in Austin.   There 
 are many other active stake-holders whose opinions and views are essential 
 for major changes.    Mark and I are working on other things besides just 
 NumPy and all NumPy changes will be discussed on list and require consensus 
 or super-majority for NumPy itself to change.     I'm not sure if that helps. 
   Is there more we can do?

As you might have heard me say before, my concern is that it has not
been easy to have good discussions on this list.   I think the problem
has been that is has not been clear what the culture was, and how
decisions got made, and that had led to some uncomfortable and
unhelpful discussions.  My plea would be for you as BDF$N to strongly
encourage 

Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Charles R Harris
On Mon, Apr 16, 2012 at 10:38 PM, Travis Oliphant tra...@continuum.iowrote:


 On Apr 16, 2012, at 11:01 PM, Charles R Harris wrote:



 On Mon, Apr 16, 2012 at 8:46 PM, Travis Oliphant tra...@continuum.iowrote:


 On Apr 16, 2012, at 8:03 PM, Matthew Brett wrote:

  Hi,
 
  On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant tra...@continuum.io
 wrote:
 
  I have heard from a few people that they are not excited by the growth
 of
  the NumPy data-structure by the 3 pointers needed to hold the
 masked-array
  storage.   This is especially true when there is talk to potentially
 add
  additional attributes to the NumPy array (for labels and other
  meta-information).  If you are willing to let us know how you feel
 about
  this, please speak up.
 
  I guess there are two questions here
 
  1) Will something like the current version of masked arrays have a
  long term future in numpy, regardless of eventual API? Most likely
  answer - yes?

 I think the answer to this is yes, but it could be as a feature-filled
 sub-class (like the current numpy.ma, except in C).


 I think making numpy.ma a subclass of ndarray has caused all sorts of
 trouble. It doesn't satisfy 'is a', rather it tries to use inheritance from
 ndarray for implementation of various parts. The upshot is that almost
 everything has to be overridden, so it didn't buy much.


 This is a valid point.   One could create a new object that is binary
 compatible with the NumPy Array but not really a sub-class but provides the
 array interface.We could keep Mark's modifications to the array
 interface as well so that it can communicate a mask.


Another place inheritance causes problems is PyUnicodeArrType inheriting
from PyUnicodeType. There the difficulty is that the unicode
itemsize/encoding may not match
between the types. IIRC, it isn't recommended that derived classes change
the itemsize. Numpy also has the different byte orderings...

The Python types are sort of like virtual classes, so in some sense they
are designed for inheritance. We could maybe set up some sort of parallel
numpy type system with empty slots and such but we would need to decide
what those slots are ahead of time. And if we got really serious, ABI
backwards compatibility would break big time.

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-16 Thread Travis Oliphant

On Apr 16, 2012, at 11:59 PM, Matthew Brett wrote:

 Hi,
 
 On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphant tra...@continuum.io wrote:
 
 I think the answer to this is yes, but it could be as a feature-filled 
 sub-class (like the current numpy.ma, except in C).
 
 I'd love to hear that argument fleshed out in more detail - do you have 
 time?
 
 
 My proposal here is to basically take the current github NumPy 
 data-structure and make this a sub-type (in C) of the NumPy 1.6 
 data-structure which is unchanged in NumPy 1.7.
 
 This would not require removing code but would require another PyTypeObject 
 and associated structures.  I expect Mark could do this work in 2-4 weeks.   
 We also have other developers who could help in order to get the sub-type in 
 NumPy 1.7. What kind of details would you like to see?
 
 I was dimly thinking of the same questions that Chuck had - about how
 subclassing would relate to the ufunc changes.

Basically, there are two sets of changes as far as I understand right now:  

1) ufunc infrastructure understands masked arrays
2) ndarray grew attributes to represent masked arrays

I am proposing that we keep 1) but change 2) so that only certain kinds of 
NumPy arrays actually have the extra function pointers (effectively a 
sub-type).   In essence, what I'm proposing is that the NumPy 1.6 PyArrayObject 
become a base-object, but the other members of the C-structure are not even 
present unless the Masked flag is set.   Such changes would not require ripping 
code out --- just altering the presentation a bit.   Yet, they could have large 
long-term implications, that we should explore before they get fixed.

Whether masked arrays should be a formal sub-class is actually an un-related 
question and I generally lean in the direction of not encouraging sub-classes 
of the ndarray.   The big questions are does this object work in the 
calculation infrastructure.   Can I add an array to a masked array.   Does it 
have a sum method?   I think it could be argued that a masked array does have a 
is a relationship with an array.   It can also be argued that it is better to 
have a has a relationship with an array and be-it's own-object.   Either way, 
this object could still have it's first-part be binary compatible with a NumPy 
Array, and that is what I'm really suggesting. 

-Travis





 
 I just think we need more data and uses and this would provide a way to get 
 that without making a forced decision one way or another.
 
 Is the proposal that this would be an alternative API to numpy.ma?
 Is numpy.ma not itself satisfactory as a test of these uses, because
 of performance or some other reason?
 
 2) Will likely changes to the masked array API make any difference to
 the number of extra pointers?  Likely answer no?
 
 Is that right?
 
 The answer to this is very likely no on the Python side.  But, on the 
 C-side, their could be some differences (i.e. are masked arrays a 
 sub-class of the ndarray or not).
 
 
 I have the impression that the masked array API discussion still has
 not come out fully into the unforgiving light of discussion day, but
 if the answer to 2) is No, then I suppose the API discussion is not
 relevant to the 3 pointers change.
 
 You are correct that the API discussion is separate from this one. 
 Overall,  I was surprised at how fervently people would oppose ABI 
 changes.   As has been pointed out, NumPy and Numeric before it were not 
 really designed to prevent having to recompile when changes were made.   
 I'm still not sure that a better overall solution is not to promote better 
 availability of downstream binary packages than excessively worry about 
 ABI changes in NumPy.But, that is the current climate.
 
 The objectors object to any binary ABI change, but not specifically
 three pointers rather than two or one?
 
 Adding pointers is not really an ABI change (but removing them after they 
 were there would be...)  It's really just the addition of data to the NumPy 
 array structure that they aren't going to use.  Most of the time it would 
 not be a real problem (the number of use-cases where you have a lot of small 
 NumPy arrays is small), but when it is a problem it is very annoying.
 
 
 Is their point then about ABI breakage?  Because that seems like a
 different point again.
 
 Yes, it's not that.
 
 
 Or is it possible that they are in fact worried about the masked array API?
 
 I don't think most people whose opinion would be helpful are really tuned in 
 to the discussion at this point.  I think they just want us to come up with 
 an answer and then move forward.But, they will judge us based on the 
 solution we come up with.
 
 
 Mark and I will talk about this long and hard.  Mark has ideas about where 
 he wants to see NumPy go, but I don't think we have fully accounted for 
 where NumPy and its user base *is* and there may be better ways to 
 approach this evolution.If others are