I recently put some thought on the issue because a user was complaining about PyTables unadvertendly removing the padding while doing a copy. Incidentally, h5py also do respect padding while doing copies, so I took this seriously and released a new PyTables version mainly for fixing this. You can see the use case and my reflections here: https://github.com/PyTables/PyTables/pull/720
So, my take on this is that the padding is an integral part of the dtype and should be respected during copies too (principle of minimal surprise). With this, I am definitely aligned (pun intended) with contract (1). Francesc Missatge de Nathaniel Smith <n...@pobox.com> del dia dv., 12 d’abr. 2019 a les 4:08: > My concern would be that to implement (2), I think .copy() has to > either special-case certain dtypes, or else we have to add some kind > of "simplify for copy" operation to the dtype protocol. These both add > architectural complexity, so maybe it's better to avoid it unless we > have a compelling reason? > > On Thu, Apr 11, 2019 at 6:51 AM Marten van Kerkwijk > <m.h.vankerkw...@gmail.com> wrote: > > > > Hi All, > > > > An issue [1] about the copying of arrays with structured dtype raised a > question about what the expected behaviour is: does copy always preserve > the dtype as is, or should it remove padding? > > > > Specifically, consider an array with a structure with many fields, say > 'a' to 'z'. Since numpy 1.16, if one does a[['a', 'z']]`, a view will be > returned. In this case, its dtype will include a large offset. Now, if we > copy this view, should the result have exactly the same dtype, including > the large offset (i.e., the copy takes as much memory as the original full > array), or should the padding be removed? From the discussion so far, it > seems the logic has boiled down to a choice between: > > > > (1) Copy is a contract that the dtype will not vary (e.g., we also do > not change endianness); > > > > (2) Copy is a contract that any access to the data in the array will > return exactly the same result, without wasting memory and possibly > optimized for access with different strides. E.g., `array[::10].copy() also > compacts the result. > > > > An argument in favour of (2) is that, before numpy 1.16, `a[['a', > 'z']].copy()` did return an array without padding. Of course, this relied > on `a[['a', 'z']]` already returning a copy without padding, but still this > is a regression. > > > > More generally, there should at least be a clear way to get the compact > copy. Also, it would make sense for things like `np.save` to remove any > padding (it currently does not). > > > > What do people think? All the best, > > > > Marten > > > > [1] https://github.com/numpy/numpy/issues/13299 > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- Francesc Alted
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion