On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett <matthew.br...@gmail.com> wrote: > Hi, > > On Sat, Mar 30, 2013 at 4:14 AM, <josef.p...@gmail.com> wrote: >> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett <matthew.br...@gmail.com> >> wrote: >>> >>> Hi, >>> >>> We were teaching today, and found ourselves getting very confused >>> about ravel and shape in numpy. >>> >>> Summary >>> -------------- >>> >>> There are two separate ideas needed to understand ordering in ravel and >>> reshape: >>> >>> Idea 1): ravel / reshape can proceed from the last axis to the first, >>> or the first to the last. This is "ravel index ordering" >>> Idea 2) The physical layout of the array (on disk or in memory) can be >>> "C" or "F" contiguous or neither. >>> This is "memory ordering" >>> >>> The index ordering is usually (but see below) orthogonal to the memory >>> ordering. >>> >>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of >>> index ordering, and this mixes the two ideas and is confusing. >>> >>> What the current situation looks like >>> ---------------------------------------------------- >>> >>> Specifically, we've been rolling this around 4 experienced numpy users >>> and we all predicted at least one of the results below wrongly. >>> >>> This was what we knew, or should have known: >>> >>> In [2]: import numpy as np >>> >>> In [3]: arr = np.arange(10).reshape((2, 5)) >>> >>> In [5]: arr.ravel() >>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> >>> So, the 'ravel' operation unravels over the last axis (1) first, >>> followed by axis 0. >>> >>> So far so good (even if the opposite to MATLAB, Octave). >>> >>> Then we found the 'order' flag to ravel: >>> >>> In [10]: arr.flags >>> Out[10]: >>> C_CONTIGUOUS : True >>> F_CONTIGUOUS : False >>> OWNDATA : False >>> WRITEABLE : True >>> ALIGNED : True >>> UPDATEIFCOPY : False >>> >>> In [11]: arr.ravel('C') >>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> >>> But we soon got confused. How about this? >>> >>> In [12]: arr_F = np.array(arr, order='F') >>> >>> In [13]: arr_F.flags >>> Out[13]: >>> C_CONTIGUOUS : False >>> F_CONTIGUOUS : True >>> OWNDATA : True >>> WRITEABLE : True >>> ALIGNED : True >>> UPDATEIFCOPY : False >>> >>> In [16]: arr_F >>> Out[16]: >>> array([[0, 1, 2, 3, 4], >>> [5, 6, 7, 8, 9]]) >>> >>> In [17]: arr_F.ravel('C') >>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> >>> Right - so the flag 'C' to ravel, has got nothing to do with *memory* >>> ordering, but is to do with *index* ordering. >>> >>> And in fact, we can ask for memory ordering specifically: >>> >>> In [22]: arr.ravel('K') >>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> >>> In [23]: arr_F.ravel('K') >>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >>> >>> In [24]: arr.ravel('A') >>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> >>> In [25]: arr_F.ravel('A') >>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >>> >>> There are some confusions to get into with the 'order' flag to reshape >>> as well, of the same type. >>> >>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering. >>> >>> This is very confusing. We think the index ordering and memory >>> ordering ideas need to be separated, and specifically, we should avoid >>> using "C" and "F" to refer to index ordering. >>> >>> Proposal >>> ------------- >>> >>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>> index ordering for ravel, reshape >>> * Prefer "Z" and "N", being graphical representations of unraveling in >>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>> naming idea by Paul Ivanov) >>> >>> What do y'all think? >>> >>> Cheers, >>> >>> Matthew >>> Paul Ivanov >>> JB Poline >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> I always thought "F" and "C" are easy to understand, I always thought about >> the content and never about the memory when using it. > > I can only say that 4 out of 4 experienced numpy developers found > themselves unable to predict the behavior of these functions before > they saw the output. > > The problem is always that explaining something makes it clearer for a > moment, but, for those who do not have the explanation or who have > forgotten it, at least among us here, the outputs were generating > groans and / or high fives as we incorrectly or correctly guessed what > was going to happen. > > I think the only way to find out whether this really is confusing or > not, is to put someone in front of these functions without any > explanation and ask them to predict what is going to come out of the > various inputs and flags. Or to try and teach it, which was the > problem we were having.
changing the names doesn't make it easier to understand. I think the confusion is because the new A and K refer to existing memory ``ravel`` is just stacking columns ('F') or stacking rows ('C'), I don't remember having seen any weird cases. ------------ I always thought of "order" in array creation is the way we want to have the memory layout of the *target* array and has nothing to do with existing memory layout (creating view or copy as needed). reshape, and ravel are *views* if possible, memory might just be some weird strides (and can be ignored unless you want to do some memory optimization, keeping track of the memory is difficult. I don't think I will start to use A and K after upgrading numpy.) >>> a1 = np.ones((10,4)) not contiguous >>> arr2 = a1[:, 2:4] >>> arr2.flags C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False stack columns (needs to make a copy) >>> arr3 = arr2.ravel('F') >>> arr3.flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False stack columns or rows with reshape (I have no idea what it did with the memory) >>> arr2.reshape(-1,1).flags C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False >>> arr2.reshape(-1,1, order='F').flags C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False >>> arr2.reshape(-1, order='F').flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False ------------------- one case where I do pay attention to memory layout is column slicing >>> arr = np.ones((10, 5), order='F') >>> for i in range(1, 5): print arr[:, :i+2].ravel('C').flags['OWNDATA'] ??? >>> for i in range(1,5): print arr[:, :i+2].ravel('F').flags['OWNDATA'] ??? Josef > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion