On Sat, Mar 5, 2011 at 11:11 PM, Mark Wiebe <[email protected]> wrote:
> On Sat, Mar 5, 2011 at 8:13 PM, Travis Oliphant <[email protected]>wrote: > >> >> On Mar 5, 2011, at 5:10 PM, Mark Wiebe wrote: >> >> On Thu, Mar 3, 2011 at 10:54 PM, Ralf Gommers < >> [email protected]> wrote: >> >>> <snip> >>> >> >> >>> >>> I've had a look at the bug tracker, here's a list of tickets for >>> 1.6: >>> >>> #1748 (blocker: regression for astype('str')) >>> >>> #1619 (issue with dtypes, with patch) >>> >>> #1749 (distutils, py 3.2) >>> >>> #1601 (distutils, py 3.2) >>> >>> #1622 (Solaris segfault, with patch) >>> >>> #1713 (Solaris segfault) >>> >>> #1631 (Solaris segfault) >>> >>> The distutils tickets are resolved. >>> >>> >>> Proposed schedule: >>> >>> March 15: beta 1 >>> >>> March 28: rc 1 >>> >>> April 17: rc 2 (if needed) >>> >>> April 24: final release >>> >>> Any comments on the schedule or tickets? >>> >> >> That all looks fine to me. There are a few things that I've changed in the >> core that could stand some discussion before being finalized in 1.6, mostly >> due to what was required to make things work without depending on the data >> type enumeration order. The combination of the numpy and scipy tests were >> pretty effective, but as Travis mentioned my changes are fairly invasive. >> >> * When copying array to array, structured types now copy based on field >> names instead of positions, effectively behaving like a 'dict' instead of a >> 'labeled tuple'. This behaviour is more intuitive to me, and several fixed >> bugs such as dtype comparison completely ignoring the structured type data >> suggest that this changes an area of numpy that has been used in a more >> limited fashion. It might be worthwhile to introduce a tuple-style flag in a >> future version which causes data to be copied by position instead of by >> name, as it is likely useful in some contexts. >> >> >> This is a semantic change that does make me a tiny bit nervous. >> Structured arrays are actually used quite a bit in the wild, and so this >> could raise some errors. What I don't know is how often sub-parts of a >> structured arrays get copied into other structured arrays with a different >> order to the fields. From what I gather, Mark's changes would allow this >> case and do an arguably useful thing. Previously, a copy was only allowed >> if the structured array contained the same fields in the same order. It >> seems like this is a relaxation of a rule and should not raise any errors >> (unless extant code was relying on the previous errors for some reason). >> > > Another important factor is that previously the performance was poor, > because each copy involved converting the array element to a Python tuple, > then copying the tuple into the destination array. The new code directly > copies the elements with no Python overhead. I haven't directly benchmarked > this, but if someone wants to confirm this with some numbers that would be > great. > >> * Array memory layouts are preserved in many cases. This means that if a, >> b are Fortran ordered, a+b will be as well. It could be made more pervasive, >> for example ndarray.copy defaults to C-order, and that could be changed to >> 'K' to preserve the memory layout by default. Any comments about that? >> >> >> I like this change quite a bit, but it has similar potential "expectation" >> issues. I think the default should be changed to 'K' in NumPy 2.0, but >> perhaps we should preserve C-order for now to avoid the subtle breakages >> that might occur based on changed expectations. What are others thoughts? >> > > I suspect defaulting to 'C' might be desirable, but I initially set it to > 'K' to see how it would work out. Defaulting it to 'C' unfortunately kills > most of the performance benefits of the new code, so it might be nice to > leave it as 'K' if no issues arise that are traced back to here. > > I suppose this might cause a problem with lazy/quick c extensions that expected elements in a certain order, so some breakage could occur. The strict rule for backward compatibility would be no breakage, and if there was no performance gain I would opt for that. But in this case there is a real gain in breaking compatibility in a small way that is unlikely to be noticed. <snip> Chuck
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
