On Tue, Jan 30, 2018 at 2:42 PM, <josef.p...@gmail.com> wrote: > > > On Tue, Jan 30, 2018 at 1:33 PM, <josef.p...@gmail.com> wrote: > >> >> >> On Tue, Jan 30, 2018 at 12:28 PM, Allan Haldane <allanhald...@gmail.com> >> wrote: >> >>> On 01/29/2018 11:50 PM, josef.p...@gmail.com wrote: >>> >>>> >>>> >>>> On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane <allanhald...@gmail.com >>>> <mailto:allanhald...@gmail.com>> wrote: >>>> >>>> On 01/29/2018 05:59 PM, josef.p...@gmail.com >>>> <mailto:josef.p...@gmail.com> wrote: >>>> >>>> >>>> >>>> On Mon, Jan 29, 2018 at 5:50 PM, <josef.p...@gmail.com >>>> <mailto:josef.p...@gmail.com> <mailto:josef.p...@gmail.com >>>> <mailto:josef.p...@gmail.com>>> wrote: >>>> >>>> >>>> >>>> On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane >>>> <allanhald...@gmail.com <mailto:allanhald...@gmail.com> >>>> <mailto:allanhald...@gmail.com <mailto:allanhald...@gmail.com >>>> >>> >>>> wrote: >>>> >>>> On 01/29/2018 04:02 PM, josef.p...@gmail.com >>>> <mailto:josef.p...@gmail.com> >>>> <mailto:josef.p...@gmail.com >>>> <mailto:josef.p...@gmail.com>> wrote: >>>> > >>>> > >>>> > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root >>>> <ben.v.r...@gmail.com <mailto:ben.v.r...@gmail.com> >>>> <mailto:ben.v.r...@gmail.com <mailto:ben.v.r...@gmail.com>> >>>> > <mailto:ben.v.r...@gmail.com >>>> <mailto:ben.v.r...@gmail.com> <mailto:ben.v.r...@gmail.com >>>> <mailto:ben.v.r...@gmail.com>>>> wrote: >>>> > >>>> > I <3 structured arrays. I love the fact that I >>>> can access data by >>>> > row and then by fieldname, or vice versa. There >>>> are times when I >>>> > need to pass just a column into a function, and >>>> there are times when >>>> > I need to process things row by row. Yes, pandas >>>> is nice if you want >>>> > the specialized indexing features, but it becomes >>>> a bear to deal >>>> > with if all you want is normal indexing, or even >>>> the ability to >>>> > easily loop over the dataset. >>>> > >>>> > >>>> > I don't think there is a doubt that structured >>>> arrays, arrays with >>>> > structured dtypes, are a useful container. The >>>> question is whether they >>>> > should be more or the foundation for more. >>>> > >>>> > For example, computing a mean, or reduce operation, >>>> over numeric element >>>> > ("columns"). Before padded views it was possible to >>>> index by selecting >>>> > the relevant "columns" and view them as standard >>>> array. With padded >>>> > views that breaks and AFAICS, there is no way in >>>> numpy 1.14.0 to compute >>>> > a mean of some "columns". (I don't have numpy 1.14 to >>>> try or find a >>>> > workaround, like maybe looping over all relevant >>>> columns.) >>>> > >>>> > Josef >>>> >>>> Just to clarify, structured types have always had >>>> padding bytes, >>>> that >>>> isn't new. >>>> >>>> What *is* new (which we are pushing to 1.15, I think) >>>> is that it >>>> may be >>>> somewhat more common to end up with padding than >>>> before, and >>>> only if you >>>> are specifically using multi-field indexing, which is a >>>> fairly >>>> specialized case. >>>> >>>> I think recfunctions already account properly for >>>> padding bytes. >>>> Except >>>> for the bug in #8100, which we will fix, padding-bytes >>>> in >>>> recarrays are >>>> more or less invisible to a non-expert who only cares >>>> about >>>> dataframe-like behavior. >>>> >>>> In other words, padding is no obstacle at all to >>>> computing a >>>> mean over a >>>> column, and single-field indexes in 1.15 behave >>>> identically as >>>> before. >>>> The only thing that will change in 1.15 is multi-field >>>> indexing, >>>> and it >>>> has never been possible to compute a mean (or any >>>> binary >>>> operation) on >>>> multiple fields. >>>> >>>> >>>> from the example in the other thread >>>> a[['b', 'c']].view(('f8', 2)).mean(0) >>>> >>>> >>>> (from the statsmodels usecase: >>>> read csv with genfromtext to get recarray or structured >>>> array >>>> select/index the numeric columns >>>> view them as standard array >>>> do whatever we can do with standard numpy arrays >>>> ) >>>> >>>> >>>> Oh ok, I misunderstood. I see your point: a mean over fields is more >>>> difficult than before. >>>> >>>> Or, to phrase it as a question: >>>> >>>> How do we get a standard array with homogeneous dtype from the >>>> corresponding elements of a structured dtype in numpy 1.14.0? >>>> >>>> Josef >>>> >>>> >>>> The answer may be that "numpy has never had a way to that", >>>> even if in a few special cases you might hack a workaround using >>>> views. >>>> >>>> That's what your example seems like to me. It uses an explicit view, >>>> which is an "expert" feature since views depend on the exact memory >>>> layout and binary representation of the array. Your example only >>>> works if the two fields have exactly the same dtype as each other >>>> and as the final dtype, and evidently breaks if there is byte >>>> padding for any reason. >>>> >>>> Pandas can do row means without these problems: >>>> >>>> >>> pd.DataFrame(np.ones(10, dtype='i8,f8')).mean(axis=0) >>>> >>>> Numpy is missing this functionality, so you or whoever wrote that >>>> example figured out a fragile workaround using views. >>>> >>>> >>>> Once upon a time (*) this wasn't fragile but the only and recommended >>>> way. Because dtypes were low level with clear memory layout and stayed that >>>> way, it was easy to check item size or whatever and get different views on >>>> it. >>>> e.g. https://mail.scipy.org/pipermail/numpy-discussion/2008-Decem >>>> ber/039340.html >>>> >>>> (*) pre-pandas, pre-stackoverflow on the mailing lists which was for me >>>> roughly 2008 to 2012 >>>> but a late thread https://mail.scipy.org/piperma >>>> il/numpy-discussion/2015-October/074014.html >>>> "What is now the recommended way of converting structured >>>> dtypes/recarrays to ndarrays?" >>>> >>>> > on final historical note (once upon a time users relied on cookbooks) > http://scipy-cookbook.readthedocs.io/items/Recarray.html# > Converting-to-regular-arrays-and-reshaping > 2010-03-09 (last modified), 2008-06-27 (created) > which I assume is broken in numpy 1.4.0 >
and a final grumpy note https://docs.scipy.org/doc/numpy-1.14.0/release.html#multiple-field-indexing-assignment-of-structured-arrays " which will affect code such as" = "which will break your code without offering an alternative" Josef <back to regular scheduled topics> > > > >> >>>> >>>> >>>> I suggest that if we want to allow either means over fields, or >>>> conversion of a n-D structured array to an n+1-D regular ndarray, we >>>> should add a dedicated function to do so in numpy.lib.recfunctions >>>> which does not depend on the binary representation of the array. >>>> >>>> >>>> I don't really want to defend an obsolete (?) usecase of structured >>>> dtypes. >>>> >>>> However, I think there should be a decision about the future plans for >>>> whether dataframe like usages of structure dtypes or through higher level >>>> classes or functions are still supported, instead of removing slowly and >>>> silently (*) the foundation for this use case, either support this usage or >>>> say you will be dropping it. >>>> >>>> (*) I didn't read the details of the release notes >>>> >>>> >>>> And another footnote about obsolete: >>>> Given that I'm the only one arguing about the dataframe_like usecase of >>>> recarrays and structured dtypes, I think they are dead for this specific >>>> usecase and only my inertia and conservativeness kept them alive in >>>> statsmodels. >>>> >>>> >>>> Josef >>>> >>> >>> It's a bit of a stretch to say that we are "silently" dropping support >>> for dataframe-like use of structured arrays. >>> >>> First, we still allow pretty much all dataframe-like use we have >>> supported since numpy 1.7, limited as it may be. We are really only >>> dropping one very specialized, expert use involving an explicit view, which >>> I still have doubts was ever more than a hack. That 2008 mailing list >>> message didn't involve multi-field indexing, which didn't exist then (only >>> introduced in 2009), and we have wanted to make them views (not copies) >>> since their inception. >>> >> >> The 2008 mailing list thread introduced me to the working with views on >> structured arrays as the ONLY way to switch between structured and >> homogenous dtypes (if the underlying item size was homogeneous). >> The new stats.models started in 2009. >> >> >>> >>> Second, I don't think we are doing so silently: We have warned about >>> this in release notes since numpy 1.7 in 2012/2013, and it gets mention in >>> most releases since then. We have also raised FutureWarnings about it since >>> 1.7. Unfortunately we missed warning in your specific case for a while, but >>> we corrected this in 1.12 so you should have seen FutureWarnings since then. >>> >> >> If I see warnings in the test suite about getting a view instead copy >> from numpy, then the only/main consequence I think about is whether I need >> to watch out for inline modification. >> I didn't expect that the followup computation would change, and that it's >> a padded view and not a view on the selected memory. However, I just >> checked and padding is mentioned in the 1.12 release notes (which I never >> read before, ). >> >> AFAICS, one problem is that the padded view didn't come with the matching >> down stream usage support, the pack function as mentioned, an alternative >> way to convert to a standard ndarray, copy doesn't get rid of the padding >> and so on. >> >> eg. another mailing list thread I just found with the same problem >> http://numpy-discussion.10968.n7.nabble.com/view-of-recarray >> -issue-td32001.html >> >> quoting Ralf: >> Question: is that really the recommended way to get an (N, 2) size float >> array from two columns of a larger record array? If so, why isn't there a >> better way? If you'd want to write to that (N, 2) array you have to append >> a copy, making it even uglier. Also, then there really should be tests for >> views in test_records.py. >> >> >> This "better way" never showed up, AFAIK. And it looks like we came back >> to this problem every few years. >> >> Josef >> >> >>> >>> I don't feel the need to officially declare that we are dropping support >>> for dataframe-like use of structured arrays. It's unclear where that use >>> ends and other uses of structured arrays begin. I think updating the docs >>> to warn that pandas/dask may be a better choice is enough, as I've been >>> doing, and then users can decide for themselves. >> >> >>> There is still the question about whether we should make >>> numpy.lib.recfunctions more official. I don't have a strong opinion. I >>> suppose it would be good to add a section to the structured array docs >>> which lists those methods and says something like >>> >>> "the submodule numpy.lib.recfunctions provides minimal functionality to >>> split, combine, and manipulate structured datatypes and arrays. In most >>> cases, we strongly recommend users use a dedicated module such as >>> pandas/xarray/dask instead of these methods, but they are provided for >>> occasional convenience." >>> >>> Allan >>> >>> >>> >>> Allan >>>> >>>> >>>> Josef >>>> >>>> >>>> Allan >>>> >>>> > >>>> > Cheers! >>>> > Ben Root >>>> > >>>> > On Mon, Jan 29, 2018 at 3:24 PM, >>>> <josef.p...@gmail.com <mailto:josef.p...@gmail.com> >>>> <mailto:josef.p...@gmail.com <mailto:josef.p...@gmail.com>> >>>> > <mailto:josef.p...@gmail.com >>>> <mailto:josef.p...@gmail.com> <mailto:josef.p...@gmail.com >>>> <mailto:josef.p...@gmail.com>>>> wrote: >>>> > >>>> > >>>> > >>>> > On Mon, Jan 29, 2018 at 2:55 PM, Stefan van >>>> der Walt >>>> > <stef...@berkeley.edu >>>> <mailto:stef...@berkeley.edu> <mailto:stef...@berkeley.edu >>>> <mailto:stef...@berkeley.edu>> >>>> <mailto:stef...@berkeley.edu >>>> <mailto:stef...@berkeley.edu> <mailto:stef...@berkeley.edu >>>> <mailto:stef...@berkeley.edu>>>> wrote: >>>> > >>>> > On Mon, 29 Jan 2018 14:10:56 -0500, >>>> josef.p...@gmail.com <mailto:josef.p...@gmail.com> >>>> <mailto:josef.p...@gmail.com <mailto:josef.p...@gmail.com>> >>>> > <mailto:josef.p...@gmail.com >>>> <mailto:josef.p...@gmail.com> >>>> >>>> <mailto:josef.p...@gmail.com >>>> <mailto:josef.p...@gmail.com>>> wrote: >>>> > >>>> > Given that there is pandas, xarray, >>>> dask and >>>> more, numpy >>>> > could as well drop >>>> > any pretense of supporting >>>> dataframe_likes. >>>> Or, adjust >>>> > the recfunctions so >>>> > we can still work dataframe_like >>>> with structured >>>> > dtypes/recarrays/recfunctions. >>>> > >>>> > >>>> > I haven't been following the duckarray >>>> discussion >>>> carefully, >>>> > but could >>>> > this be an opportunity for a dataframe >>>> protocol, >>>> so that we >>>> > can have >>>> > libraries ingest structured arrays, >>>> record >>>> arrays, pandas >>>> > dataframes, >>>> > etc. without too much specialized code? >>>> > >>>> > >>>> > AFAIU while not being in the data handling >>>> area, >>>> pandas defines >>>> > the interface and other libraries provide >>>> pandas >>>> compatible >>>> > interfaces or implementations. >>>> > >>>> > statsmodels currently still has recarray >>>> support and >>>> usage. In >>>> > some interfaces we support pandas, >>>> recarrays and >>>> plain arrays, >>>> > or anything where asarray works correctly. >>>> > >>>> > But recarrays became messy to support, one >>>> rewrite of >>>> some >>>> > functions last year converts recarrays to >>>> pandas, >>>> does the >>>> > manipulation and then converts back to >>>> recarrays. >>>> > Also we need to adjust our recarray usage >>>> with new numpy >>>> > versions. But there is no real benefit >>>> because I >>>> doubt that >>>> > statsmodels still has any >>>> recarray/structured dtype >>>> users. So, >>>> > we only have to remove our own uses in the >>>> datasets >>>> and unit tests. >>>> > >>>> > Josef >>>> > >>>> > >>>> > >>>> > >>>> > Stéfan >>>> > >>>> > _____________________________ >>>> __________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion@python.org >>>> <mailto:NumPy-Discussion@python.org> >>>> <mailto:NumPy-Discussion@python.org >>>> <mailto:NumPy-Discussion@python.org>> >>>> <mailto:NumPy-Discussion@python.org >>>> <mailto:NumPy-Discussion@python.org> >>>> <mailto:NumPy-Discussion@python.org >>>> <mailto:NumPy-Discussion@python.org>>> >>>> > >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>>> <https://mail.python.org/mailm >>>> an/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion>> >>>> > <https://mail.python.org/mail >>>> man/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>>> <https://mail.python.org/mailm >>>> an/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion>>> >>>> > >>>> > >>>> > >>>> > _____________________________ >>>> __________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion@python.org >>>> <mailto:NumPy-Discussion@python.org> >>>> <mailto:NumPy-Discussion@python.org >>>> <mailto:NumPy-Discussion@python.org>> >>>> <mailto:NumPy-Discussion@python.org >>>> <mailto:NumPy-Discussion@python.org> >>>> <mailto:NumPy-Discussion@python.org >>>> <mailto:NumPy-Discussion@python.org>>> >>>> > >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>>> <https://mail.python.org/mailm >>>> an/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion>> >>>> > <https://mail.python.org/mail >>>> man/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>>> <https://mail.python.org/mailm >>>> an/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion>>> >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion@python.org >>>> <mailto:NumPy-Discussion@python.org> >>>> <mailto:NumPy-Discussion@python.org >>>> <mailto:NumPy-Discussion@python.org>> >>>> <mailto:NumPy-Discussion@python.org >>>> <mailto:NumPy-Discussion@python.org> >>>> <mailto:NumPy-Discussion@python.org >>>> <mailto:NumPy-Discussion@python.org>>> >>>> > >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>>> <https://mail.python.org/mailm >>>> an/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion>> >>>> > <https://mail.python.org/mail >>>> man/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>>> <https://mail.python.org/mailm >>>> an/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion>>> >>>> > >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion@python.org >>>> <mailto:NumPy-Discussion@python.org> >>>> <mailto:NumPy-Discussion@python.org >>>> <mailto:NumPy-Discussion@python.org>> >>>> > >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>>> <https://mail.python.org/mailm >>>> an/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion>> >>>> > >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org >>>> > >>>> <mailto:NumPy-Discussion@python.org >>>> <mailto:NumPy-Discussion@python.org>> >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>>> <https://mail.python.org/mailm >>>> an/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org >>>> > >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion@python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion