Re: [Numpy-discussion] Missing Values Discussion

Bruce Southey Fri, 08 Jul 2011 06:22:27 -0700

On 07/08/2011 07:15 AM, Matthew Brett wrote:
> Hi Travis,
>
> On Fri, Jul 8, 2011 at 5:03 AM, Travis Oliphant<[email protected]>  
> wrote:
>> Hi all,
>>
>> I want to first apologize for stepping into this discussion a bit late and 
>> for not being able to participate adequately.   However, I want to offer a 
>> couple of perspectives, and my opinion about what we should do as well as 
>> clarify what I have instructed Mark to do as part of his summer work.
>>
>> First, the discussion has reminded me how valuable it is to get feedback 
>> from all points of view.  While it does lengthen the process, it 
>> significantly enhances the result.  I strongly hope we can continue the 
>> tradition of respectful discussion on this mailing list where people's views 
>> are treated with respect --- even if we don't always have the time to 
>> understand them in depth.
>>
>> I also really appreciate people taking the time to visit on the phone call 
>> with me as it gave me a chance to understand many opinions quickly and at 
>> least start to form a possibly useful opinion.
>>
>> Basically, because there is not consensus and in fact a strong and 
>> reasonable opposition to specific points, Mark's NEP as proposed cannot be 
>> accepted in its entirety right now.   However,  I believe an implementation 
>> of his NEP is useful and will be instructive in resolving the issues and so 
>> I have instructed him to spend Enthought time on the implementation.   Any 
>> changes that need to be made to the API before it is accepted into a 
>> released form of NumPy can still be made even after most of the 
>> implementation is completed as far as I understand it.   This is because 
>> most of the disagreement is about the specific ability to manipulate the 
>> masks independently of assigning missing data and the creation of an 
>> additional np.HIDE (np.IGNORE) concept at the Python level.
>>
>> Despite some powerful arguments on both sides of the discussion, I am 
>> confident that we can figure out an elegant solution that will work long 
>> term.
>>
>> My current opinion is that I am very favorable to making easy the use-case 
>> that has been repeatedly described of having "missing data" that is *always* 
>> missing and then having "hidden data" that you don't want to think about for 
>> a particular set of calculations (but you also don't want to through away by 
>> over-writing).   I think it is important to make it easy to keep that data 
>> around without over-writing but also have the "idea" of that kind of missing 
>> data different than the idea of data you can't care about because it just 
>> isn't there.
>>
>> I also think it is important for the calculation infrastructure to have just 
>> one notion of "missing data" which Mark's NEP handles beautifully.
>>
>> It seems to me that some of the disagreement is one of perspective in that 
>> Mark articulates very well the position of "generic programming, 
>> make-opaque-the-implementation" perspective with a focus on the implications 
>> of missing data for calculations.    Nathaniel and Matthew articulate well 
>> the perspective of "focusing" on the data object itself and the desire to 
>> keep separate the different ideas behind missing data that have been 
>> described --- as well as a powerfully described description of the NumPy 
>> tradition of exposing the raw data to the Python side without hiding too 
>> much of the implementation from the user.
>>
>> I think it's a healthy discussion.   But, I would like to see Mark's code 
>> get completed so that we can start talking about code examples.   Please 
>> don't interpret my instructing Mark to finish the code as "it's been 
>> decided".  I simply think it's the best path forward to ultimately resolving 
>> the concerns.   I would like to see an API worked out before summer's end 
>> --- and I'm hopeful everyone will be excited about what the resulting design 
>> is.
>>
>> I do think there is room for agreement in the present debate if we all 
>> remember to keep listening to each other.  It takes a lot of effort to 
>> understand somebody else's point of view.  I have been grateful to see 
>> evidence I see of that behavior multiple times (in Mark's revamping of the 
>> NEP, in Matthew Brett's re-statement of his interpretation of Mark's views, 
>> in Nathaniel's working hard to engage the dialogue even in the throes of 
>> finishing his PhD, and many other examples).
>>
>> It makes me very happy to be a part of this community.  I look forward to 
>> times when I can send more thoughtful and technical emails than this one.
> Thanks for this email - it is very helpful.
>
> Personally I was worrying that:
>
> A) Mark had not fully grasped our concern
> B) Disagreement was not welcome
>
> and this gave me an uncomfortable feeling about A) the resulting API
> and B) the discussion.  You've dealt with both here, and thank you for
> that.
>
> Can I ask - what do you recommend that we do now, for the discussion?
> Should we be quiet and wait until there is code to test, or, as
> Nathaniel has tried to do, work at reaching some compromise that makes
> sense to some or all parties?
>
> Thanks again,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
I agree that this has been very interesting discussion especially the 
great interaction between everyone.


The one thing that we do need now is the code that implements the small 
set of core ideas (array creation and simple numerical operations). 
Hopefully that will provide a better grasp of the concepts and the 
performance differences to determine the acceptability of the approach(es).

Bruce
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Missing Values Discussion

Reply via email to