[Numpy-discussion] Missing Values Discussion

Travis Oliphant Thu, 07 Jul 2011 21:05:01 -0700

Hi all, 

I want to first apologize for stepping into this discussion a bit late and for 
not being able to participate adequately.   However, I want to offer a couple 
of perspectives, and my opinion about what we should do as well as clarify what 
I have instructed Mark to do as part of his summer work.


First, the discussion has reminded me how valuable it is to get feedback from 
all points of view.  While it does lengthen the process, it significantly 
enhances the result.  I strongly hope we can continue the tradition of 
respectful discussion on this mailing list where people's views are treated 
with respect --- even if we don't always have the time to understand them in 
depth.   

I also really appreciate people taking the time to visit on the phone call with 
me as it gave me a chance to understand many opinions quickly and at least 
start to form a possibly useful opinion.   

Basically, because there is not consensus and in fact a strong and reasonable 
opposition to specific points, Mark's NEP as proposed cannot be accepted in its 
entirety right now.   However,  I believe an implementation of his NEP is 
useful and will be instructive in resolving the issues and so I have instructed 
him to spend Enthought time on the implementation.   Any changes that need to 
be made to the API before it is accepted into a released form of NumPy can 
still be made even after most of the implementation is completed as far as I 
understand it.   This is because most of the disagreement is about the specific 
ability to manipulate the masks independently of assigning missing data and the 
creation of an additional np.HIDE (np.IGNORE) concept at the Python level.  

Despite some powerful arguments on both sides of the discussion, I am confident 
that we can figure out an elegant solution that will work long term.    

My current opinion is that I am very favorable to making easy the use-case that 
has been repeatedly described of having "missing data" that is *always* missing 
and then having "hidden data" that you don't want to think about for a 
particular set of calculations (but you also don't want to through away by 
over-writing).   I think it is important to make it easy to keep that data 
around without over-writing but also have the "idea" of that kind of missing 
data different than the idea of data you can't care about because it just isn't 
there.   

I also think it is important for the calculation infrastructure to have just 
one notion of "missing data" which Mark's NEP handles beautifully.   

It seems to me that some of the disagreement is one of perspective in that Mark 
articulates very well the position of "generic programming, 
make-opaque-the-implementation" perspective with a focus on the implications of 
missing data for calculations.    Nathaniel and Matthew articulate well the 
perspective of "focusing" on the data object itself and the desire to keep 
separate the different ideas behind missing data that have been described --- 
as well as a powerfully described description of the NumPy tradition of 
exposing the raw data to the Python side without hiding too much of the 
implementation from the user.  

I think it's a healthy discussion.   But, I would like to see Mark's code get 
completed so that we can start talking about code examples.   Please don't 
interpret my instructing Mark to finish the code as "it's been decided".  I 
simply think it's the best path forward to ultimately resolving the concerns.   
I would like to see an API worked out before summer's end --- and I'm hopeful 
everyone will be excited about what the resulting design is.  

I do think there is room for agreement in the present debate if we all remember 
to keep listening to each other.  It takes a lot of effort to understand 
somebody else's point of view.  I have been grateful to see evidence I see of 
that behavior multiple times (in Mark's revamping of the NEP, in Matthew 
Brett's re-statement of his interpretation of Mark's views, in Nathaniel's 
working hard to engage the dialogue even in the throes of finishing his PhD, 
and many other examples). 

It makes me very happy to be a part of this community.  I look forward to times 
when I can send more thoughtful and technical emails than this one.  

All the best, 

-Travis


_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Missing Values Discussion

Reply via email to