On 10/11/2015 14:17, Sebastian Berg wrote:
Actually, it is the "sequence special case" type ;). (matlab does not
have this, since matlab always returns 2-D I realized).

As I said, if usecols is like indexing, the result should mimic:

arr = np.loadtxt(f)
arr = arr[usecols]

in which case a 1-D array is returned if you put in a scalar into
usecols (and you could even generalize usecols to higher dimensional
array-likes).
The way you implemented it -- which is fine, but I want to stress that
there is a real decision being made here --, you always see it as a
sequence but allow a scalar for convenience (i.e. always return a 2-D
array). It is a `sequence of ints or int` type argument and not an
array-like argument in my opinion.

I think we have two separate problems here:

The first one is whether loadtxt should always return a 2D array or should it match the shape of the usecol argument. From a CS guy point of view I do understand your concern here. Now from a teacher point of view I know many people expect to get a "matrix" (thank you Matlab...) and the "purity" of matching the dimension of the usecol variable will be seen by many people [1] as a nerdy useless heavyness noone cares of (no offense). So whatever you, seadoned numpy devs from this mailing list, decide I think it should be explained in the docstring with a very clear wording.

My own opinion on this first problem is that loadtxt() should always return a 2D array, no less, no more. If I write np.loadtxt(f)[42] it means I want to read the whole file and then I explicitely ask for transforming the 2-D array loadtxt() returned into a 1-D array. Otoh if I write loadtxt(f, usecol=42) it means I don't want to read the other columns and I want only this one, but it does not mean that I want to change the returned array from 2-D to 1-D. I know this new behavior might break a lot of existing code as usecol=(42,) used to return a 1-D array, but usecol=((((42,)))) also returns a 1-D array so the current behavior is not consistent imho.

The second problem is about the wording in the docstring, when I see "sequence of int or int" I uderstand I will have to cast into a 1-D python list whatever wicked N-dimensional object I use to store my column indexes, or hope list(my_object) will do it fine. On the other hand when I read "array-like" the function is telling me I don't have to worry about my object, as long as numpy knows how to cast it into an array it will be fine.

Anyway I think something like that:

import numpy as np
a=[[[2,],[],[],],[],[],[]]
foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a)

should just work and return me a 2-D (or 1-D if you like) array with the data I asked for and I don't think "a" here is an int or a sequence of int (but it's a good example of why loadtxt() should not match the shape of the usecol argument).

To make it short, let the reading function read the data in a consistent and predictible way and then let the user explicitely change the data's shape into anything he likes.

Regards.

[1] read non CS people trying to switch to numpy/scipy
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to