Hello again, I also have a minor code comment:
In get_object_offsets you iterate over dtype.fields.values(). Be careful, because dtype.fields also includes the field titles. For example this fails: dta = np.dtype([(('a', 'title'), 'O'), ('b', 'O'), ('c', 'i1')]) dtb = np.dtype([('a', 'O'), ('b', 'O'), ('c', 'i1')]) assert dtype_view_is_safe(dta, dtb) I've seen two strategies in the numpy code to work around this. One is to to skip entries that are titles, like this: for key,field in dtype.fields.iteritems(): if len(field) == 3 and field[2] == key: #detect titles continue #do something You can find all examples that do this by grepping NPY_TITLE_KEY in the numpy source. The other (more popular) strategy is to iterate over dtype.names. You can find all examples of this by grepping for names_size. I don't know the history of it, but it looks to me like "titles" in dtypes are an obsolete feature. Are they actually used anywhere? Allan On 01/28/2015 07:56 PM, Jaime Fernández del Río wrote: > HI all, > > There has been some recent discussion going on on the limitations that > numpy imposes to taking views of an array with a different dtype. > > As of right now, you can basically only take a view of an array if it > has no Python objects and neither the old nor the new dtype are > structured. Furthermore, the array has to be either C or Fortran contiguous. > > This seem to be way too strict, but the potential for disaster getting a > loosening of the restrictions wrong is big, so it should be handled with > care. > > Allan Haldane and myself have been looking into this separately and > discussing some of the details over at github, and we both think that > the only true limitation that has to be imposed is that the offsets of > Python objects within the new and old dtypes remain compatible. I have > expanded Allan's work from here: > > https://github.com/ahaldane/numpy/commit/e9ca367 > > to make it as flexible as I have been able. An implementation of the > algorithm in Python, with a few tests, can be found here: > > https://gist.github.com/jaimefrio/b4dae59fa09fccd9638c#file-dtype_compat-py > > I would appreciate getting some eyes on it for correctness, and to make > sure that it won't break with some weird dtype. > > I am also trying to figure out what the ground rules for stride and > shape conversions when taking a view with a different dtype should be. I > submitted a PR (gh-5508) a couple for days ago working on that, but I am > not so sure that the logic is completely sound. Again, to get more eyes > on it, I am going to reproduce my thoughts here on the hope of getting > some feedback. > > The objective would be to always allow a view of a different dtype > (given that the dtypes be compatible as described above) to be taken if: > > * The itemsize of the dtype doesn't change. > * The itemsize changes, but the array being viewed is the result of > slicing and transposing axes of a contiguous array, and it is still > contiguous, defined as stride == dtype.itemsize, along its > smallest-strided dimension, and the itemsize of the newtype exactly > divides the size of that dimension. > * Ideally taking a view should be a reversible process, i.e. if > oldtype = arr.dtype, then doing arr.view(newtype).view(oldtype) > should give you back a view of arr with the same original shape, > strides and dtype. > > This last point can get tricky if the minimal stride dimension has size > 1, as there could be several of those, e.g.: > > >>> a = np.ones((3, 4, 1), dtype=float)[:, :2, :].transpose(0, 2, 1) > >>> a.flags.contiguous > False > >>> a.shape > (3, 1, 2) > >>> a.strides # the stride of the size 1 dimension could be > anything, ignore it! > (32, 8, 8) > > b = a.view(complex) # this fails right now, but should work > >>> b.flags.contiguous > False > >>> b.shape > (3, 1, 1) > >>> b.strides # the stride of the size 1 dimensions could be > anything, ignore them! > (32, 16, 16) > > c = b.view(float) # which of the two size 1 dimensions should we > expand? > > > "In the face of ambiguity refuse the temptation to guess" dictates that > last view should raise an error, unless we agree and document some > default. Any thoughts? > > Then there is the endless complication one could get into with arrays > created with as_strided. I'm not smart enough to figure when and when > not those could work, but am willing to retake the discussion if someone > wiser si interested. > > With all these in mind, my proposal for the new behavior is that taking > a view of an array with a different dtype would require: > > 1. That the newtype and oldtype be compatible, as defined by the > algorithm checking object offsets linked above. > 2. If newtype.itemsize == oldtype.itemsize no more checks are needed, > make it happen! > 3. If the array is C/Fortran contiguous, check that the size in bytes > of the last/first dimension is evenly divided by newtype.itemsize. > If it does, go for it. > 4. For non-contiguous arrays: > 1. Ignoring dimensions of size 1, check that no stride is smaller > than either oldtype.itemsize or newtype.itemsize. If any is > found this is an as_strided product, sorry, can't do it! > 2. Ignoring dimensions of size 1, find a contiguous dimension, i.e. > stride == oldtype.itemsize > 1. If found, check that it is the only one with that stride, > that it is the minimal stride, and that the size in bytes of > that dimension is evenly divided by newitem,itemsize. > 2. If none is found, check if there is a size 1 dimension that > is also unique (unless we agree on a default, as mentioned > above) and that newtype.itemsize evenly divides > oldtype.itemsize. > > Apologies for the long, dense content, but any thought or comments are > very welcome. > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus > planes de dominación mundial. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion