Hi all,
I've been thinking a lot about the masked array implementation lately. I
finally had the time to look hard at what has been done and now am of the
opinion that I do not think that 1.7 can be released with the current state of
the masked array implementation *unless* it is clearly marked as experimental
and may be changed in 1.8
I wish I had been able to be a bigger part of this conversation last year.
But, that is why I took the steps I took to try and figure out another way to
feed my family *and* stay involved in the NumPy community. I would love to
stay involved in what is happening in the SciPy community, but I am more
satisfied with what Ralf, Warren, Robert, Pauli, Josef, Charles, Stefan, and
others are doing there right now, and don't have time to keep up with
everything. Even though SciPy was the heart and soul of why I even got
involved with Python for open source in the first place and took many years of
my volunteer labor, I won't be able to spend significant time on SciPy code
over the coming months. At some point, I really hope to be able to make
contributions again to that code-base. Time will tell whether or not my
aspirations will be realized. It depends quite a bit on whether or not my kids
have what they need from me (which right now is money and time).
NumPy, on the other hand, is not in a position where I can feel comfortable
leaving my "baby" to others. I recognize and value the contributions from many
people to make NumPy what it is today (e.g. code contributions, code
rearrangement and standardization, build and install improvement, and most
recently some architectural changes). But, I feel a personal responsibility
for the code base as I spent a great many months writing NumPy in the first
place, and I've spent a great deal of time interacting with NumPy users and
feel like I have at least some sense of their stories. Of course, I built on
the shoulders of giants, and much of what is there is *because of* where the
code was adapted from (it was not created de-novo). Currently, there remains
much that needs to be communicated, improved, and worked on, and I have
specific opinions about what some changes and improvements should be, how they
should be written, and how the resulting users need to be benefited.
It will take time to discuss all of this, and that's where I will spend my
open-source time in the coming months.
In that vein:
Because it is slated to go into release 1.7, we need to re-visit the masked
array discussion again. The NEP process is the appropriate one and I'm glad
we are taking that route for these discussions. My goal is to get consensus
in order for code to get into NumPy (regardless of who writes the code). It
may be that we don't come to a consensus (reasonable and intelligent people can
disagree on things --- look at the coming election...). We can represent
different parts of what is fortunately a very large user-base of NumPy users.
First of all, I want to be clear that I think there is much great work that has
been done in the current missing data code. There are some nice features in
the where clause of the ufunc and the machinery for the iterator that allows
re-using ufunc loops that are not re-written to check for missing data. I'm
sure there are other things as well that I'm not quite aware of yet.
However, I don't think the API presented to the numpy user presently is the
correct one for NumPy 1.X.
A few particulars:
* the reduction operations need to default to "skipna" --- this is the
most common use case which has been re-inforced again to me today by a new user
to Python who is using masked arrays presently
* the mask needs to be visible to the user if they use that approach to
missing data (people should be able to get a hold of the mask and work with it
in Python)
* bit-pattern approaches to missing data (at least for float64 and
int32) need to be implemented.
* there should be some way when using "masks" (even if it's hidden from
most users) for missing data to separate the low-level ufunc operation from the
operation
on the masks...
I have heard from several users that they will *not use the missing data* in
NumPy as currently implemented, and I can now see why. For better or for
worse, my approach to software is generally very user-driven and very
pragmatic. On the other hand, I'm also a mathematician and appreciate the
cognitive compression that can come out of well-formed structure.
None-the-less, I'm an *applied* mathematician and am ultimately motivated by
applications.
I will get a hold of the NEP and spend some time with it to discuss some of
this in that document. This will take several weeks (as PyCon is next week
and I have a tutorial I'm giving there). For now, I do not think 1.7 can be
released unless the masked array is labeled *experimental*.
Thanks,
-Travis
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion