On Tue, Jan 30, 2018 at 1:33 PM, <josef.p...@gmail.com> wrote: > > > On Tue, Jan 30, 2018 at 12:28 PM, Allan Haldane <allanhald...@gmail.com> > wrote: > >> On 01/29/2018 11:50 PM, josef.p...@gmail.com wrote: >> >>> >>> >>> On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane <allanhald...@gmail.com >>> <mailto:allanhald...@gmail.com>> wrote: >>> >>> On 01/29/2018 05:59 PM, josef.p...@gmail.com >>> <mailto:josef.p...@gmail.com> wrote: >>> >>> >>> >>> On Mon, Jan 29, 2018 at 5:50 PM, <josef.p...@gmail.com >>> <mailto:josef.p...@gmail.com> <mailto:josef.p...@gmail.com >>> <mailto:josef.p...@gmail.com>>> wrote: >>> >>> >>> >>> On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane >>> <allanhald...@gmail.com <mailto:allanhald...@gmail.com> >>> <mailto:allanhald...@gmail.com <mailto:allanhald...@gmail.com>>> >>> wrote: >>> >>> On 01/29/2018 04:02 PM, josef.p...@gmail.com >>> <mailto:josef.p...@gmail.com> >>> <mailto:josef.p...@gmail.com >>> <mailto:josef.p...@gmail.com>> wrote: >>> > >>> > >>> > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root >>> <ben.v.r...@gmail.com <mailto:ben.v.r...@gmail.com> >>> <mailto:ben.v.r...@gmail.com <mailto:ben.v.r...@gmail.com>> >>> > <mailto:ben.v.r...@gmail.com >>> <mailto:ben.v.r...@gmail.com> <mailto:ben.v.r...@gmail.com >>> <mailto:ben.v.r...@gmail.com>>>> wrote: >>> > >>> > I <3 structured arrays. I love the fact that I >>> can access data by >>> > row and then by fieldname, or vice versa. There >>> are times when I >>> > need to pass just a column into a function, and >>> there are times when >>> > I need to process things row by row. Yes, pandas >>> is nice if you want >>> > the specialized indexing features, but it becomes >>> a bear to deal >>> > with if all you want is normal indexing, or even >>> the ability to >>> > easily loop over the dataset. >>> > >>> > >>> > I don't think there is a doubt that structured >>> arrays, arrays with >>> > structured dtypes, are a useful container. The >>> question is whether they >>> > should be more or the foundation for more. >>> > >>> > For example, computing a mean, or reduce operation, >>> over numeric element >>> > ("columns"). Before padded views it was possible to >>> index by selecting >>> > the relevant "columns" and view them as standard >>> array. With padded >>> > views that breaks and AFAICS, there is no way in >>> numpy 1.14.0 to compute >>> > a mean of some "columns". (I don't have numpy 1.14 to >>> try or find a >>> > workaround, like maybe looping over all relevant >>> columns.) >>> > >>> > Josef >>> >>> Just to clarify, structured types have always had >>> padding bytes, >>> that >>> isn't new. >>> >>> What *is* new (which we are pushing to 1.15, I think) >>> is that it >>> may be >>> somewhat more common to end up with padding than >>> before, and >>> only if you >>> are specifically using multi-field indexing, which is a >>> fairly >>> specialized case. >>> >>> I think recfunctions already account properly for >>> padding bytes. >>> Except >>> for the bug in #8100, which we will fix, padding-bytes >>> in >>> recarrays are >>> more or less invisible to a non-expert who only cares >>> about >>> dataframe-like behavior. >>> >>> In other words, padding is no obstacle at all to >>> computing a >>> mean over a >>> column, and single-field indexes in 1.15 behave >>> identically as >>> before. >>> The only thing that will change in 1.15 is multi-field >>> indexing, >>> and it >>> has never been possible to compute a mean (or any binary >>> operation) on >>> multiple fields. >>> >>> >>> from the example in the other thread >>> a[['b', 'c']].view(('f8', 2)).mean(0) >>> >>> >>> (from the statsmodels usecase: >>> read csv with genfromtext to get recarray or structured >>> array >>> select/index the numeric columns >>> view them as standard array >>> do whatever we can do with standard numpy arrays >>> ) >>> >>> >>> Oh ok, I misunderstood. I see your point: a mean over fields is more >>> difficult than before. >>> >>> Or, to phrase it as a question: >>> >>> How do we get a standard array with homogeneous dtype from the >>> corresponding elements of a structured dtype in numpy 1.14.0? >>> >>> Josef >>> >>> >>> The answer may be that "numpy has never had a way to that", >>> even if in a few special cases you might hack a workaround using >>> views. >>> >>> That's what your example seems like to me. It uses an explicit view, >>> which is an "expert" feature since views depend on the exact memory >>> layout and binary representation of the array. Your example only >>> works if the two fields have exactly the same dtype as each other >>> and as the final dtype, and evidently breaks if there is byte >>> padding for any reason. >>> >>> Pandas can do row means without these problems: >>> >>> >>> pd.DataFrame(np.ones(10, dtype='i8,f8')).mean(axis=0) >>> >>> Numpy is missing this functionality, so you or whoever wrote that >>> example figured out a fragile workaround using views. >>> >>> >>> Once upon a time (*) this wasn't fragile but the only and recommended >>> way. Because dtypes were low level with clear memory layout and stayed that >>> way, it was easy to check item size or whatever and get different views on >>> it. >>> e.g. https://mail.scipy.org/pipermail/numpy-discussion/2008-Decem >>> ber/039340.html >>> >>> (*) pre-pandas, pre-stackoverflow on the mailing lists which was for me >>> roughly 2008 to 2012 >>> but a late thread https://mail.scipy.org/piperma >>> il/numpy-discussion/2015-October/074014.html >>> "What is now the recommended way of converting structured >>> dtypes/recarrays to ndarrays?" >>> >>> on final historical note (once upon a time users relied on cookbooks) http://scipy-cookbook.readthedocs.io/items/Recarray. html#Converting-to-regular-arrays-and-reshaping 2010-03-09 (last modified), 2008-06-27 (created) which I assume is broken in numpy 1.4.0
> >>> >>> >>> I suggest that if we want to allow either means over fields, or >>> conversion of a n-D structured array to an n+1-D regular ndarray, we >>> should add a dedicated function to do so in numpy.lib.recfunctions >>> which does not depend on the binary representation of the array. >>> >>> >>> I don't really want to defend an obsolete (?) usecase of structured >>> dtypes. >>> >>> However, I think there should be a decision about the future plans for >>> whether dataframe like usages of structure dtypes or through higher level >>> classes or functions are still supported, instead of removing slowly and >>> silently (*) the foundation for this use case, either support this usage or >>> say you will be dropping it. >>> >>> (*) I didn't read the details of the release notes >>> >>> >>> And another footnote about obsolete: >>> Given that I'm the only one arguing about the dataframe_like usecase of >>> recarrays and structured dtypes, I think they are dead for this specific >>> usecase and only my inertia and conservativeness kept them alive in >>> statsmodels. >>> >>> >>> Josef >>> >> >> It's a bit of a stretch to say that we are "silently" dropping support >> for dataframe-like use of structured arrays. >> >> First, we still allow pretty much all dataframe-like use we have >> supported since numpy 1.7, limited as it may be. We are really only >> dropping one very specialized, expert use involving an explicit view, which >> I still have doubts was ever more than a hack. That 2008 mailing list >> message didn't involve multi-field indexing, which didn't exist then (only >> introduced in 2009), and we have wanted to make them views (not copies) >> since their inception. >> > > The 2008 mailing list thread introduced me to the working with views on > structured arrays as the ONLY way to switch between structured and > homogenous dtypes (if the underlying item size was homogeneous). > The new stats.models started in 2009. > > >> >> Second, I don't think we are doing so silently: We have warned about this >> in release notes since numpy 1.7 in 2012/2013, and it gets mention in most >> releases since then. We have also raised FutureWarnings about it since 1.7. >> Unfortunately we missed warning in your specific case for a while, but we >> corrected this in 1.12 so you should have seen FutureWarnings since then. >> > > If I see warnings in the test suite about getting a view instead copy from > numpy, then the only/main consequence I think about is whether I need to > watch out for inline modification. > I didn't expect that the followup computation would change, and that it's > a padded view and not a view on the selected memory. However, I just > checked and padding is mentioned in the 1.12 release notes (which I never > read before, ). > > AFAICS, one problem is that the padded view didn't come with the matching > down stream usage support, the pack function as mentioned, an alternative > way to convert to a standard ndarray, copy doesn't get rid of the padding > and so on. > > eg. another mailing list thread I just found with the same problem > http://numpy-discussion.10968.n7.nabble.com/view-of-recarray > -issue-td32001.html > > quoting Ralf: > Question: is that really the recommended way to get an (N, 2) size float > array from two columns of a larger record array? If so, why isn't there a > better way? If you'd want to write to that (N, 2) array you have to append > a copy, making it even uglier. Also, then there really should be tests for > views in test_records.py. > > > This "better way" never showed up, AFAIK. And it looks like we came back > to this problem every few years. > > Josef > > >> >> I don't feel the need to officially declare that we are dropping support >> for dataframe-like use of structured arrays. It's unclear where that use >> ends and other uses of structured arrays begin. I think updating the docs >> to warn that pandas/dask may be a better choice is enough, as I've been >> doing, and then users can decide for themselves. > > >> There is still the question about whether we should make >> numpy.lib.recfunctions more official. I don't have a strong opinion. I >> suppose it would be good to add a section to the structured array docs >> which lists those methods and says something like >> >> "the submodule numpy.lib.recfunctions provides minimal functionality to >> split, combine, and manipulate structured datatypes and arrays. In most >> cases, we strongly recommend users use a dedicated module such as >> pandas/xarray/dask instead of these methods, but they are provided for >> occasional convenience." >> >> Allan >> >> >> >> Allan >>> >>> >>> Josef >>> >>> >>> Allan >>> >>> > >>> > Cheers! >>> > Ben Root >>> > >>> > On Mon, Jan 29, 2018 at 3:24 PM, >>> <josef.p...@gmail.com <mailto:josef.p...@gmail.com> >>> <mailto:josef.p...@gmail.com <mailto:josef.p...@gmail.com>> >>> > <mailto:josef.p...@gmail.com >>> <mailto:josef.p...@gmail.com> <mailto:josef.p...@gmail.com >>> <mailto:josef.p...@gmail.com>>>> wrote: >>> > >>> > >>> > >>> > On Mon, Jan 29, 2018 at 2:55 PM, Stefan van >>> der Walt >>> > <stef...@berkeley.edu >>> <mailto:stef...@berkeley.edu> <mailto:stef...@berkeley.edu >>> <mailto:stef...@berkeley.edu>> >>> <mailto:stef...@berkeley.edu >>> <mailto:stef...@berkeley.edu> <mailto:stef...@berkeley.edu >>> <mailto:stef...@berkeley.edu>>>> wrote: >>> > >>> > On Mon, 29 Jan 2018 14:10:56 -0500, >>> josef.p...@gmail.com <mailto:josef.p...@gmail.com> >>> <mailto:josef.p...@gmail.com <mailto:josef.p...@gmail.com>> >>> > <mailto:josef.p...@gmail.com >>> <mailto:josef.p...@gmail.com> >>> >>> <mailto:josef.p...@gmail.com >>> <mailto:josef.p...@gmail.com>>> wrote: >>> > >>> > Given that there is pandas, xarray, >>> dask and >>> more, numpy >>> > could as well drop >>> > any pretense of supporting >>> dataframe_likes. >>> Or, adjust >>> > the recfunctions so >>> > we can still work dataframe_like >>> with structured >>> > dtypes/recarrays/recfunctions. >>> > >>> > >>> > I haven't been following the duckarray >>> discussion >>> carefully, >>> > but could >>> > this be an opportunity for a dataframe >>> protocol, >>> so that we >>> > can have >>> > libraries ingest structured arrays, >>> record >>> arrays, pandas >>> > dataframes, >>> > etc. without too much specialized code? >>> > >>> > >>> > AFAIU while not being in the data handling >>> area, >>> pandas defines >>> > the interface and other libraries provide >>> pandas >>> compatible >>> > interfaces or implementations. >>> > >>> > statsmodels currently still has recarray >>> support and >>> usage. In >>> > some interfaces we support pandas, recarrays >>> and >>> plain arrays, >>> > or anything where asarray works correctly. >>> > >>> > But recarrays became messy to support, one >>> rewrite of >>> some >>> > functions last year converts recarrays to >>> pandas, >>> does the >>> > manipulation and then converts back to >>> recarrays. >>> > Also we need to adjust our recarray usage >>> with new numpy >>> > versions. But there is no real benefit >>> because I >>> doubt that >>> > statsmodels still has any >>> recarray/structured dtype >>> users. So, >>> > we only have to remove our own uses in the >>> datasets >>> and unit tests. >>> > >>> > Josef >>> > >>> > >>> > >>> > >>> > Stéfan >>> > >>> > _____________________________ >>> __________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion@python.org >>> <mailto:NumPy-Discussion@python.org> >>> <mailto:NumPy-Discussion@python.org >>> <mailto:NumPy-Discussion@python.org>> >>> <mailto:NumPy-Discussion@python.org >>> <mailto:NumPy-Discussion@python.org> >>> <mailto:NumPy-Discussion@python.org >>> <mailto:NumPy-Discussion@python.org>>> >>> > >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>> <https://mail.python.org/mailm >>> an/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion>> >>> > <https://mail.python.org/mail >>> man/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>> <https://mail.python.org/mailm >>> an/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion>>> >>> > >>> > >>> > >>> > _____________________________ >>> __________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion@python.org >>> <mailto:NumPy-Discussion@python.org> >>> <mailto:NumPy-Discussion@python.org >>> <mailto:NumPy-Discussion@python.org>> >>> <mailto:NumPy-Discussion@python.org >>> <mailto:NumPy-Discussion@python.org> >>> <mailto:NumPy-Discussion@python.org >>> <mailto:NumPy-Discussion@python.org>>> >>> > >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>> <https://mail.python.org/mailm >>> an/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion>> >>> > <https://mail.python.org/mail >>> man/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>> <https://mail.python.org/mailm >>> an/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion>>> >>> > >>> > >>> > >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion@python.org >>> <mailto:NumPy-Discussion@python.org> >>> <mailto:NumPy-Discussion@python.org >>> <mailto:NumPy-Discussion@python.org>> >>> <mailto:NumPy-Discussion@python.org >>> <mailto:NumPy-Discussion@python.org> >>> <mailto:NumPy-Discussion@python.org >>> <mailto:NumPy-Discussion@python.org>>> >>> > >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>> <https://mail.python.org/mailm >>> an/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion>> >>> > <https://mail.python.org/mail >>> man/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>> <https://mail.python.org/mailm >>> an/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion>>> >>> > >>> > >>> > >>> > >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion@python.org >>> <mailto:NumPy-Discussion@python.org> >>> <mailto:NumPy-Discussion@python.org >>> <mailto:NumPy-Discussion@python.org>> >>> > >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>> <https://mail.python.org/mailm >>> an/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion>> >>> > >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> >>> <mailto:NumPy-Discussion@python.org >>> <mailto:NumPy-Discussion@python.org>> >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>> <https://mail.python.org/mailm >>> an/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >>> >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion