On Mon, Jan 22, 2018 at 10:53 AM, <[email protected]> wrote:
>
>
> On Sun, Jan 21, 2018 at 9:48 PM, Allan Haldane <[email protected]>
> wrote:
>
>> Hello all,
>>
>> We are making a decision (again) about what to do about the
>> behavior of multiple-field indexing of structured arrays: Should
>> it return a view or a copy, and on what release schedule?
>>
>> As a reminder, this refers to operations like (1.13 behavior):
>>
>> >>> a = np.zeros(3, dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'f4')])
>> >>> a[['a', 'c']]
>> array([(0, 0.), (0, 0.), (0, 0.)],
>> dtype=[('a', '<i4'), ('c', '<f4')]
>>
>> In numpy 1.14.0 we made this return a view instead of a copy, but
>> downstream test failures suggest we reconsider. In our current
>> implementation for 1.14.1, we have reverted this change, but
>> still plan to go through with it in 1.15.
>>
>> See here for our discussion the problem and solutions:
>> https://github.com/numpy/numpy/pull/10411
>>
>> The two main options we have discussed are either to try to make
>> the change in 1.15, or never make the change at all and always
>> return a copy.
>>
>> Here are some pros and cons:
>>
>> Pros (change to view in 1.15)
>> =============================
>>
>> * Views are useful and convenient. Other forms of indexing also
>> often return views so this is more consistent.
>> * This change has been planned since numpy 1.7 in 2009,
>> and there have been visible FutureWarnings about it since
>> then. Anyone whose code will break should have seen the
>> warnings. It has been extensively warned about in recent
>> release notes.
>> * Past discussions have supported the change. See my comment in
>> the PR with many links to them and to other history.
>> * Users have requested the change on the list.
>> * Possibly a majority of the reported code failures were not
>> actually caused by the change, but by another bug (#8100)
>> involving np.load/np.save which this change exposed. If we
>> push it off to 1.15, we will have time to fix this other bug.
>> (There were no FutureWarnings for this breakage, of course).
>> * The code that really will break is of the form
>> a[['a', 'c']].view('i8')
>> because the returned itemsize is different. This has
>> raised FutureWarnings since numpy 1.7, and no users reported
>> failures due to this change. In the PR we still try to
>> mitigate this breakage by introducing a new method
>> `pack_fields`, which converts the result into the 1.13 form,
>> so that
>> np.pack_fields(a[['a', 'c']]).view('i8')
>> will work.
>>
>>
>> Cons (keep returning a copy)
>> ============================
>>
>> * The extra convenience is not really that much, and fancy
>> indexing also returns a copy instead of a view, so there is
>> a precedent there.
>> * We want to minimize compatibility breaks with old behavior.
>> We've had a fair amount of discussion and complaints about
>> how we break things in general.
>> * We have lived with a "copy" for 8 years now. At some point the
>> behavior gets set in stone for compatibility reasons.
>> * Users have written to the list and github about their code
>> breaking in 1.14.0. As far as I am aware, they all refer
>> to the #8100 problem.
>> * If a new function `pack_fields` is needed to guard against
>> mishaps with the view behavior, that seems like a sign that
>> keeping the copy behavior is the best option from an API
>> perspective.
>>
>> My initial vote is go with the change in 1.15: The "view" code
>> that will ultimately break (not the code related to #8100) has
>> been sending FutureWarnings for many years, and I am not aware of
>> any user complaints involving it: All the complaints so far
>> would be fixed with #8100 in 1.15.
>>
>>
> (Note based on a linked mailing list thread, 2012 might be the last time I
> looked more closely at structured dtypes.
> So some of what I understand might be outdated.)
>
>
> views on structured dtypes are very important, but viewing them as
> standard arrays with standard dtypes is the main part that I had used.
> Essentially structured dtypes are useless for any computation, e.g. just
> some simple reduce operation. To work with them we need a standard view.
>
> I think the usecase that fails in statsmodels (except there is no test
> failure anymore because we switched to using pandas in the unit test)
>
do add a detail here
results is a recarray created from a csv file with
results = genfromtxt(open(filename, "rb"), delimiter=",",
names=True,dtype=float)
['acvar_lb','acvar_ub'] are the last two columns, so this corresponds to my
example below where AFAIU no padding is necessary to get a view.
>
>
> cls.confint_res = cls.results[['acvar_lb','acvar
> _ub']].view((float,
> >
> 2))
> E ValueError: Changing the dtype to a subarray type is only
> supported if the total itemsize is unchanged
>
>
> This is similar to the above example
> a[['a', 'c']].view('i8')
> but it doesn't try to combine fields.
>
> In many examples where I used structured dtypes a long time ago, switched
> between consistent views as either a standard array of subsets or as
> .structured dtypes.
> For this usecase it wouldn't matter whether a[['a', 'c']] returns a view
> or copy, as long as we can get the second view that is consistent with the
> selected part of the memory. This would also be independent of whether
> numpy pads internally and adjusts the strides if possible or not.
>
> >>> np.__version__
> '1.11.2'
>
> >>> a = np.ones(5, dtype=[('a', 'i8'), ('b', 'f8'), ('c', 'f8')])
> >>> a
> array([(1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0),
> (1, 1.0, 1.0)],
> dtype=[('a', '<i8'), ('b', '<f8'), ('c', '<f8')])
>
> >>> a.mean(0)
> Traceback (most recent call last):
> File "<pyshell#15>", line 1, in <module>
> a.mean(0)
> File "C:\...\python-3.4.4.amd64\lib\site-packages\numpy\core\_methods.py",
> line 65, in _mean
> ret = umr_sum(arr, axis, dtype, out, keepdims)
> TypeError: cannot perform reduce with flexible type
>
> >>> a[['b', 'c']].mean(0)
> Traceback (most recent call last):
> File "<pyshell#16>", line 1, in <module>
> a[['b', 'c']].mean(0)
> File "C:\...\python-3.4.4.amd64\lib\site-packages\numpy\core\_methods.py",
> line 65, in _mean
> ret = umr_sum(arr, axis, dtype, out, keepdims)
> TypeError: cannot perform reduce with flexible type
>
> >>> a[['b', 'c']].view(('f8', 2)).mean(0)
> array([ 1., 1.])
> >>> a[['b', 'c']].view(('f8', 2)).dtype
> dtype('float64')
>
>
> Aside The plan is that statsmodels will drop all usage and support for
> rec_arays/structured dtypes
> in the following release (0.10).
> Then structured dtypes are free (from our perspective) to provide low
> level struct support
> instead of pretending to be dataframe_like.
>
> Josef
>
>
>
>> Feel free to also discuss the related proposed change, to make
>> np.diag return a view instead of a copy. That change has
>> not been implemented yet, only proposed.
>
>
>> Cheers,
>> Allan
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
>
_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion