Re: [Numpy-discussion] Numpy 1.6 schedule (was: Numpy 2.0 schedule)

Mark Wiebe Sat, 05 Mar 2011 22:11:31 -0800

On Sat, Mar 5, 2011 at 8:13 PM, Travis Oliphant <oliph...@enthought.com>wrote:


>
> On Mar 5, 2011, at 5:10 PM, Mark Wiebe wrote:
>
> On Thu, Mar 3, 2011 at 10:54 PM, Ralf Gommers <ralf.gomm...@googlemail.com
> > wrote:
>
>> <snip>
>>
>
>
>> >>> I've had a look at the bug tracker, here's a list of tickets for 1.6:
>> >>> #1748 (blocker: regression for astype('str'))
>> >>> #1619 (issue with dtypes, with patch)
>> >>> #1749 (distutils, py 3.2)
>> >>> #1601 (distutils, py 3.2)
>> >>> #1622 (Solaris segfault, with patch)
>> >>> #1713 (Solaris segfault)
>> >>> #1631 (Solaris segfault)
>>
>> The distutils tickets are resolved.
>>
>> >>> Proposed schedule:
>> >>> March 15: beta 1
>> >>> March 28: rc 1
>> >>> April 17: rc 2 (if needed)
>> >>> April 24: final release
>>
>> Any comments on the schedule or tickets?
>>
>
> That all looks fine to me. There are a few things that I've changed in the
> core that could stand some discussion before being finalized in 1.6, mostly
> due to what was required to make things work without depending on the data
> type enumeration order. The combination of the numpy and scipy tests were
> pretty effective, but as Travis mentioned my changes are fairly invasive.
>
> * When copying array to array, structured types now copy based on field
> names instead of positions, effectively behaving like a 'dict' instead of a
> 'labeled tuple'. This behaviour is more intuitive to me, and several fixed
> bugs such as dtype comparison completely ignoring the structured type data
> suggest that this changes an area of numpy that has been used in a more
> limited fashion. It might be worthwhile to introduce a tuple-style flag in a
> future version which causes data to be copied by position instead of by
> name, as it is likely useful in some contexts.
>
>
> This is a semantic change that does make me a tiny bit nervous.
>  Structured arrays are actually used quite a bit in the wild, and so this
> could raise some errors.     What I don't know is how often sub-parts of a
> structured arrays get copied into other structured arrays with a different
> order to the fields.    From what I gather, Mark's changes would allow this
> case and do an arguably useful thing.    Previously, a copy was only allowed
> if the structured array contained the same fields in the same order.     It
> seems like this is a relaxation of a rule and should not raise any errors
> (unless extant code was relying on the previous errors for some reason).
>

Another important factor is that previously the performance was poor,
because each copy involved converting the array element to a Python tuple,
then copying the tuple into the destination array. The new code directly
copies the elements with no Python overhead. I haven't directly benchmarked
this, but if someone wants to confirm this with some numbers that would be
great.

> * Array memory layouts are preserved in many cases. This means that if a, b
> are Fortran ordered, a+b will be as well. It could be made more pervasive,
> for example ndarray.copy defaults to C-order, and that could be changed to
> 'K' to preserve the memory layout by default. Any comments about that?
>
>
> I like this change quite a bit, but it has similar potential "expectation"
> issues.   I think the default should be changed to 'K' in NumPy 2.0, but
> perhaps we should preserve C-order for now to avoid the subtle breakages
> that might occur based on changed expectations.    What are others thoughts?
>

I suspect defaulting to 'C' might be desirable, but I initially set it to
'K' to see how it would work out. Defaulting it to 'C' unfortunately kills
most of the performance benefits of the new code, so it might be nice to
leave it as 'K' if no issues arise that are traced back to here.

* The ufunc uses a more consistent algorithm for loop selection. The
> previous algorithm was ad hoc and lacked symmetry, while the new algorithm
> is based on a simple minimization definition. This change exposed a bug in
> scipy's ndimage, which did not handle all of the numpy data type enums
> properly, so its possible there is more code out there which will be
> affected similarly.
>
>
> This change has me the most nervous.  I'm looking forward to the more
> consistent algorithm.  As I said, the algorithm presently used as been there
> since Numeric in 1995 (I modified it only a little bit to handle
> scalar-array casting rules a bit differently).    This kind of change will
> have different corner cases and this should be understood before a release.
>

I don't think there is much reason to worry here. The only substantive
difference is that the new algorithm doesn't skip ahead based on the type of
the first operand. This code is also extremely thoroughly exercised, messing
with the ufuncs can even cause numpy startup to fail. The trickiest ad hoc
part of loop selection was how accumulation of logical operations settles on
the boolean loop when the inputs are not boolean, and this still works the
same, but now is coded explicitly instead of appearing to be an accident.

Of course, testing this thoroughly is necessary, and the work being done by
Ralf, Christoph, and others is fantastic.

I'm also wondering what happened to the optional arguments to ufuncs (are
> they still there)?   One of these allowed you to choose the loop
> yourself and bypass the selection algorithm.
>

All the existing parameters are still there, and I added tests for the 'sig'
parameter. It might be worthwhile for someone who has used these features to
write examples for the documentation, as they're a bit tricky to understand
from the descriptions.

I also added an 'out' keyword parameter to make ufuncs more consistent with
other functions' output parameters.

> In general, I've used the implementation strategy of substituting my code
> into the core critical paths of numpy to maximize the amount of exercise it
> gets. While this creates more short-term hiccups as we are seeing now, it
> also means the new functionality conforms to the current system better and
> is much more stable since it is getting well tested.
>
>
> Thanks again for all the good core-algorithm work, Mark.  You have being
> doing a great job.
>

Thanks!
Mark

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy 1.6 schedule (was: Numpy 2.0 schedule)

Reply via email to