Pierre GM wrote: > On Nov 25, 2008, at 2:06 PM, Ryan May wrote: >> 1) It looks like the function returns a structured array rather than a >> rec array, so that fields are obtained by doing a dictionary access. >> Since it's a dictionary access, is there any reason that the header >> needs to be munged to replace characters and reserved names? IIUC, >> csv2rec changes names b/c it returns a rec array, which uses attribute >> lookup and hence all names need to be valid python identifiers. >> This is >> not the case for a structured array. > > Personally, I prefer flexible ndarrays to recarrays, hence the output. > However, I still think that names should be as clean as possible to > avoid bad surprises down the road.
Ok, I'm not really partial to this, I just thought it would simplify. Your point is valid. >> 2) Can we avoid the use of seek() in here? I just posted a patch to >> change the check to readline, which was the only file function used >> previously. This allowed the direct use of a file-like object >> returned >> by urllib2.urlopen(). > > I coded that a couple of weeks ago, before you posted your patch and I > didn't have tme to check it. Yes, we could try getting rid of seek. > However, we need to find a way to rewind to the beginning of the file > if the dtypes are not given in input (as we parsed the whole file to > find the best converter in that case). What about doing the parsing and type inference in a loop and holding onto the already split lines? Then loop through the lines with the converters that were finally chosen? In addition to making my usecase work, this has the benefit of not doing the I/O twice. >> 3) In order to avoid breaking backwards compatibility, can we change >> to >> default for dtype to be float32, and instead use some kind of special >> value ('auto' ?) to use the automatic dtype determination? > > I'm not especially concerned w/ backwards compatibility, because we're > supporting masked values (something that np.loadtxt shouldn't have to > worry about). Initially, I needed a replacement to the fromfile > function in the scikits.timeseries.trecords package. I figured it'd be > easier and more portable to get a function for generic masked arrays, > that could be adapted afterwards to timeseries. In any case, I was > more considering the functions I send you to be part of some > numpy.ma.io module than a replacement to np.loadtxt. I tried to get > the syntax as close as possible to np.loadtxt and mlab.csv2rec, but > there'll always be some differences. > > So, yes, we could try to use a default dtype=float and yes, we could > have an extra parameter 'auto'. But is it really that useful ? I'm not > sure (well, no, I'm sure it's not...) I understand you're not concerned with backwards compatibility, but with the exception of missing handling, which is probably specific to masked arrays, I was hoping to just add functionality to loadtxt(). Numpy doesn't need a separate text reader for most of this and breaking API for any of this is likely a non-starter. So while, yes, having float be the default dtype is probably not the most useful, leaving it also doesn't break existing code. -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion