On Thu, 27 Nov 2008 09:08:41 +0100 Manuel Metz <[EMAIL PROTECTED]> wrote: > Pierre GM wrote: >> On Nov 26, 2008, at 5:55 PM, Ryan May wrote: >> >>> Manuel Metz wrote: >>>> Ryan May wrote: >>>>> 3) Better support for missing values. The docstring >>>>>mentions a >>>>> way of >>>>> handling missing values by passing in a converter. The >>>>>problem >>>>> with this is >>>>> that you have to pass in a converter for *every column* >>>>>that will >>>>> contain >>>>> missing values. If you have a text file with 50 >>>>>columns, writing >>>>> this >>>>> dictionary of converters seems like ugly and needless >>>>> boilerplate. I'm >>>>> unsure of how best to pass in both what values indicate >>>>>missing >>>>> values and >>>>> what values to fill in their place. I'd love >>>>>suggestions >>>> Hi Ryan, >>>> this would be a great feature to have !!! >> >> About missing values: >> >> * I don't think missing values should be supported in >>np.loadtxt. That >> should go into a specific np.ma.io.loadtxt function, a >>preview of >> which I posted earlier. I'll modify it taking Ryan's new >>function into >> account, and Chrisopher's suggestion (defining a >>dictionary {column >> name : missing values}. >> >> * StringConverter already defines some default filling >>values for each >> dtype. In np.ma.io.loadtxt, these values can be >>overwritten. Note >> that you should also be able to define a filling value >>by specifying a >> converter (think float(x or 0) for example) >> >> * Missing values on space-separated fields are very >>tricky to handle: >> take a line like "a,,,d". With a comma as separator, >>it's clear that >> the 2nd and 3rd fields are missing. >> Now, imagine that commas are actually spaces ( "a >> d"): 'd' is now >> seen as the 2nd field of a 2-field record, not as the >>4th field of a 4- >> field record with 2 missing values. I thought about it, >>and kicked in >> touch >> >> * That said, there should be a way to deal with >>fixed-length fields, >> probably by taking consecutive slices of the initial >>string. That way, >> we should be able to keep track of missing data... > > Certainly, yes! Dealing with fixed-length fields would >be necessary. The > case I had in mind had both -- a separator ("|") __and__ >fixed-length > fields -- and is probably very special in that sense. >But such > data-files exists out there... > See page 9, 10 (Bulk data input deck) http://www.zonatech.com/Documentation/zndalusersmanual2.0.pdf
Nils _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion