Hi, On Sat, Mar 30, 2013 at 2:20 PM, <josef.p...@gmail.com> wrote: > On Sat, Mar 30, 2013 at 4:57 PM, <josef.p...@gmail.com> wrote: >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett <matthew.br...@gmail.com> >> wrote: >>> Hi, >>> >>> On Sat, Mar 30, 2013 at 4:14 AM, <josef.p...@gmail.com> wrote: >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett <matthew.br...@gmail.com> >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> We were teaching today, and found ourselves getting very confused >>>>> about ravel and shape in numpy. >>>>> >>>>> Summary >>>>> -------------- >>>>> >>>>> There are two separate ideas needed to understand ordering in ravel and >>>>> reshape: >>>>> >>>>> Idea 1): ravel / reshape can proceed from the last axis to the first, >>>>> or the first to the last. This is "ravel index ordering" >>>>> Idea 2) The physical layout of the array (on disk or in memory) can be >>>>> "C" or "F" contiguous or neither. >>>>> This is "memory ordering" >>>>> >>>>> The index ordering is usually (but see below) orthogonal to the memory >>>>> ordering. >>>>> >>>>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of >>>>> index ordering, and this mixes the two ideas and is confusing. >>>>> >>>>> What the current situation looks like >>>>> ---------------------------------------------------- >>>>> >>>>> Specifically, we've been rolling this around 4 experienced numpy users >>>>> and we all predicted at least one of the results below wrongly. >>>>> >>>>> This was what we knew, or should have known: >>>>> >>>>> In [2]: import numpy as np >>>>> >>>>> In [3]: arr = np.arange(10).reshape((2, 5)) >>>>> >>>>> In [5]: arr.ravel() >>>>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>>> >>>>> So, the 'ravel' operation unravels over the last axis (1) first, >>>>> followed by axis 0. >>>>> >>>>> So far so good (even if the opposite to MATLAB, Octave). >>>>> >>>>> Then we found the 'order' flag to ravel: >>>>> >>>>> In [10]: arr.flags >>>>> Out[10]: >>>>> C_CONTIGUOUS : True >>>>> F_CONTIGUOUS : False >>>>> OWNDATA : False >>>>> WRITEABLE : True >>>>> ALIGNED : True >>>>> UPDATEIFCOPY : False >>>>> >>>>> In [11]: arr.ravel('C') >>>>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>>> >>>>> But we soon got confused. How about this? >>>>> >>>>> In [12]: arr_F = np.array(arr, order='F') >>>>> >>>>> In [13]: arr_F.flags >>>>> Out[13]: >>>>> C_CONTIGUOUS : False >>>>> F_CONTIGUOUS : True >>>>> OWNDATA : True >>>>> WRITEABLE : True >>>>> ALIGNED : True >>>>> UPDATEIFCOPY : False >>>>> >>>>> In [16]: arr_F >>>>> Out[16]: >>>>> array([[0, 1, 2, 3, 4], >>>>> [5, 6, 7, 8, 9]]) >>>>> >>>>> In [17]: arr_F.ravel('C') >>>>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>>> >>>>> Right - so the flag 'C' to ravel, has got nothing to do with *memory* >>>>> ordering, but is to do with *index* ordering. >>>>> >>>>> And in fact, we can ask for memory ordering specifically: >>>>> >>>>> In [22]: arr.ravel('K') >>>>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>>> >>>>> In [23]: arr_F.ravel('K') >>>>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >>>>> >>>>> In [24]: arr.ravel('A') >>>>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>>> >>>>> In [25]: arr_F.ravel('A') >>>>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >>>>> >>>>> There are some confusions to get into with the 'order' flag to reshape >>>>> as well, of the same type. >>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering. >>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>> ordering ideas need to be separated, and specifically, we should avoid >>>>> using "C" and "F" to refer to index ordering. >>>>> >>>>> Proposal >>>>> ------------- >>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>> index ordering for ravel, reshape >>>>> * Prefer "Z" and "N", being graphical representations of unraveling in >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>> naming idea by Paul Ivanov) >>>>> >>>>> What do y'all think? >>>>> >>>>> Cheers, >>>>> >>>>> Matthew >>>>> Paul Ivanov >>>>> JB Poline >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion@scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>>> >>>> I always thought "F" and "C" are easy to understand, I always thought about >>>> the content and never about the memory when using it. >>> >>> I can only say that 4 out of 4 experienced numpy developers found >>> themselves unable to predict the behavior of these functions before >>> they saw the output. >>> >>> The problem is always that explaining something makes it clearer for a >>> moment, but, for those who do not have the explanation or who have >>> forgotten it, at least among us here, the outputs were generating >>> groans and / or high fives as we incorrectly or correctly guessed what >>> was going to happen. >>> >>> I think the only way to find out whether this really is confusing or >>> not, is to put someone in front of these functions without any >>> explanation and ask them to predict what is going to come out of the >>> various inputs and flags. Or to try and teach it, which was the >>> problem we were having. >> >> changing the names doesn't make it easier to understand. >> I think the confusion is because the new A and K refer to existing memory >> >> >> ``ravel`` is just stacking columns ('F') or stacking rows ('C'), I >> don't remember having seen any weird cases. > > example from our statistics use: > rows are observations/time periods, columns are variables/individuals > > using "F" or "C", we can stack either by time-periods (observations) > or individuals (cross-section units) > that's easy to understand.
I disagree, I think it's confusing, but I have evidence, and that is that four out of four of us tested ourselves and got it wrong. Perhaps we are particularly dumb or poorly informed, but I think it's rash to assert there is no problem here. Cheers, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion