Re: [Numpy-discussion] Views of a different dtype

Allan Haldane Thu, 29 Jan 2015 08:59:40 -0800

Hello again,

I also have a minor code comment:


In get_object_offsets you iterate over dtype.fields.values(). Be
careful, because dtype.fields also includes the field titles. For
example this fails:

    dta = np.dtype([(('a', 'title'), 'O'), ('b', 'O'), ('c', 'i1')])
    dtb = np.dtype([('a', 'O'), ('b', 'O'), ('c', 'i1')])
    assert dtype_view_is_safe(dta, dtb)

I've seen two strategies in the numpy code to work around this. One is
to to skip entries that are titles, like this:

    for key,field in dtype.fields.iteritems():
        if len(field) == 3 and field[2] == key: #detect titles
            continue
        #do something

You can find all examples that do this by grepping NPY_TITLE_KEY in the
numpy source.

The other (more popular) strategy is to iterate over dtype.names. You
can find all examples of this by grepping for names_size.

I don't know the history of it, but it looks to me like "titles" in
dtypes are an obsolete feature. Are they actually used anywhere?

Allan

On 01/28/2015 07:56 PM, Jaime Fernández del Río wrote:
> HI all,
> 
> There has been some recent discussion going on on the limitations that
> numpy imposes to taking views of an array with a different dtype.
> 
> As of right now, you can basically only take a view of an array if it
> has no Python objects and neither the old nor the new dtype are
> structured. Furthermore, the array has to be either C or Fortran contiguous.
> 
> This seem to be way too strict, but the potential for disaster getting a
> loosening of the restrictions wrong is big, so it should be handled with
> care.
> 
> Allan Haldane and myself have been looking into this separately and
> discussing some of the details over at github, and we both think that
> the only true limitation that has to be imposed is that the offsets of
> Python objects within the new and old dtypes remain compatible. I have
> expanded Allan's work from here:
> 
> https://github.com/ahaldane/numpy/commit/e9ca367
> 
> to make it as flexible as I have been able. An implementation of the
> algorithm in Python, with a few tests, can be found here:
> 
> https://gist.github.com/jaimefrio/b4dae59fa09fccd9638c#file-dtype_compat-py
> 
> I would appreciate getting some eyes on it for correctness, and to make
> sure that it won't break with some weird dtype.
> 
> I am also trying to figure out what the ground rules for stride and
> shape conversions when taking a view with a different dtype should be. I
> submitted a PR (gh-5508) a couple for days ago working on that, but I am
> not so sure that the logic is completely sound. Again, to get more eyes
> on it, I am going to reproduce my thoughts here on the hope of getting
> some feedback.
> 
> The objective would be to always allow a view of a different dtype
> (given that the dtypes be compatible as described above) to be taken if:
> 
>   * The itemsize of the dtype doesn't change.
>   * The itemsize changes, but the array being viewed is the result of
>     slicing and transposing axes of a contiguous array, and it is still
>     contiguous, defined as stride == dtype.itemsize, along its
>     smallest-strided dimension, and the itemsize of the newtype exactly
>     divides the size of that dimension.
>   * Ideally taking a view should be a reversible process, i.e. if
>     oldtype = arr.dtype, then doing arr.view(newtype).view(oldtype)
>     should give you back a view of arr with the same original shape,
>     strides and dtype.
> 
> This last point can get tricky if the minimal stride dimension has size
> 1, as there could be several of those, e.g.:
> 
>     >>> a = np.ones((3, 4, 1), dtype=float)[:, :2, :].transpose(0, 2, 1)
>     >>> a.flags.contiguous
>     False
>     >>> a.shape
>     (3, 1, 2)
>     >>> a.strides  # the stride of the size 1 dimension could be
>     anything, ignore it!
>     (32, 8, 8)
> 
>     b = a.view(complex)  # this fails right now, but should work
>     >>> b.flags.contiguous
>     False
>     >>> b.shape
>     (3, 1, 1)
>     >>> b.strides  # the stride of the size 1 dimensions could be
>     anything, ignore them!
>     (32, 16, 16)
> 
>     c = b.view(float)  # which of the two size 1 dimensions should we
>     expand?
> 
> 
> "In the face of ambiguity refuse the temptation to guess" dictates that
> last view should raise an error, unless we agree and document some
> default. Any thoughts?
> 
> Then there is the endless complication one could get into with arrays
> created with as_strided. I'm not smart enough to figure when and when
> not those could work, but am willing to retake the discussion if someone
> wiser si interested.
> 
> With all these in mind, my proposal for the new behavior is that taking
> a view of an array with a different dtype would require:
> 
>  1. That the newtype and oldtype be compatible, as defined by the
>     algorithm checking object offsets linked above.
>  2. If newtype.itemsize == oldtype.itemsize no more checks are needed,
>     make it happen!
>  3. If the array is C/Fortran contiguous, check that the size in bytes
>     of the last/first dimension is evenly divided by newtype.itemsize.
>     If it does, go for it.
>  4. For non-contiguous arrays:
>      1. Ignoring dimensions of size 1, check that no stride is smaller
>         than either oldtype.itemsize or newtype.itemsize. If any is
>         found this is an as_strided product, sorry, can't do it!
>      2. Ignoring dimensions of size 1, find a contiguous dimension, i.e.
>         stride == oldtype.itemsize
>          1. If found, check that it is the only one with that stride,
>             that it is the minimal stride, and that the size in bytes of
>             that dimension is evenly divided by newitem,itemsize.
>          2. If none is found, check if there is a size 1 dimension that
>             is also unique (unless we agree on a default, as mentioned
>             above) and that newtype.itemsize evenly divides
>             oldtype.itemsize.
> 
> Apologies for the long, dense content, but any thought or comments are
> very welcome.
> 
> Jaime
> 
> -- 
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
> planes de dominación mundial.
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Views of a different dtype

Reply via email to