Re: [Numpy-discussion] loadtxt and usecols

Irvin Probst Tue, 10 Nov 2015 07:09:13 -0800

On 10/11/2015 14:17, Sebastian Berg wrote:

Actually, it is the "sequence special case" type ;). (matlab does not
have this, since matlab always returns 2-D I realized).


As I said, if usecols is like indexing, the result should mimic:

arr = np.loadtxt(f)
arr = arr[usecols]

in which case a 1-D array is returned if you put in a scalar into
usecols (and you could even generalize usecols to higher dimensional
array-likes).
The way you implemented it -- which is fine, but I want to stress that
there is a real decision being made here --, you always see it as a
sequence but allow a scalar for convenience (i.e. always return a 2-D
array). It is a `sequence of ints or int` type argument and not an
array-like argument in my opinion.


I think we have two separate problems here:

The first one is whether loadtxt should always return a 2D array orshould it match the shape of the usecol argument. From a CS guy point ofview I do understand your concern here. Now from a teacher point of viewI know many people expect to get a "matrix" (thank you Matlab...) andthe "purity" of matching the dimension of the usecol variable will beseen by many people [1] as a nerdy useless heavyness noone cares of (nooffense). So whatever you, seadoned numpy devs from this mailing list,decide I think it should be explained in the docstring with a very clearwording.

My own opinion on this first problem is that loadtxt() should alwaysreturn a 2D array, no less, no more. If I write np.loadtxt(f)[42] itmeans I want to read the whole file and then I explicitely ask fortransforming the 2-D array loadtxt() returned into a 1-D array. Otoh ifI write loadtxt(f, usecol=42) it means I don't want to read the othercolumns and I want only this one, but it does not mean that I want tochange the returned array from 2-D to 1-D. I know this new behaviormight break a lot of existing code as usecol=(42,) used to return a 1-Darray, but usecol=((((42,)))) also returns a 1-D array so the currentbehavior is not consistent imho.

The second problem is about the wording in the docstring, when I see"sequence of int or int" I uderstand I will have to cast into a 1-Dpython list whatever wicked N-dimensional object I use to store mycolumn indexes, or hope list(my_object) will do it fine. On the otherhand when I read "array-like" the function is telling me I don't have toworry about my object, as long as numpy knows how to cast it into anarray it will be fine.


Anyway I think something like that:

import numpy as np
a=[[[2,],[],[],],[],[],[]]
foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a)

should just work and return me a 2-D (or 1-D if you like) array with thedata I asked for and I don't think "a" here is an int or a sequence ofint (but it's a good example of why loadtxt() should not match the shapeof the usecol argument).

To make it short, let the reading function read the data in a consistentand predictible way and then let the user explicitely change the data'sshape into anything he likes.


Regards.

[1] read non CS people trying to switch to numpy/scipy
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] loadtxt and usecols

Reply via email to