On Thu, Jan 7, 2010 at 4:45 PM, Christopher Barker <chris.bar...@noaa.gov> wrote: > Bruce Southey wrote: >>> <chris.bar...@noaa.gov> wrote: > >> Using the numpy NaN or similar (noting R's approach to missing values >> which in turn allows it to have the above functionality) is just a >> very bad idea for missing values because you always have to check that >> which NaN is a missing value and which was due to some numerical >> calculation. > > well, this is specific to reading files, so you know where it came from. > And the principle of fromfile() is that it is fast and simple, if you > want masked arrays, use slower, but more full-featured methods. > > However, in this case: > > In [9]: np.fromstring("3, 4, NaN, 5", sep=",") > Out[9]: array([ 3., 4., NaN, 5.]) > > > An actual NaN is read from the file, rather than a missing value. > Perhaps the user does want the distinction, so maybe it should really > only fil it in if the users asks for it, but specifying > "missing_value=np.nan" or something. > >>>From what I can see is that you expect that fromfile() should only >> split at the supplied delimiters, optionally(?) strip any whitespace > > whitespace stripping is not optional. > >> Your output from this string '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12' >> actually assumes multiple delimiters because there is no comma between >> 4 and 5 and 8 and 9. > > Yes, that's the point. I thought about allowing arbitrary multiple > delimiters, but I think '/n' is a special case - for instance, a comma > at the end of some numbers might mean missing data, but a '\n' would not. > > And I couldn't really think of a useful use-case for arbitrary multiple > delimiters. > >> In Josef's last case how many 'missing values should there be? > > >> extra newlines at end of file > >> str = '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n' > > none -- exactly why I think \n is a special case. > > What about: > >> extra newlines in the middle of the file > >> str = '1, 2, 3, 4\n\n5, 6, 7, 8\n9, 10, 11, 12\n' > > I think they should be ignored, but I hope I'm not making something that > is too specific to my personal needs. > > Travis Oliphant wrote: >> +1 (ignoring new-lines transparently is a nice feature). You can also >> use sscanf with weave to read most files. > > right -- but that requires weave. In fact, MATLAB has a fscanf function > that allows you to pass in a C format string and it vectorizes it to use > the same one over an over again until it's done. It's actually quite > powerful and flexible. I once started with that in mind, but didn't have > the C chops to do it. I ended up with a tool that only did doubles (come > to think of it, MATLAB only does doubles, anyway...) > > I may some day write a whole new C (or, more likely, Cython) function > that does something like that, but for now, I'm jsut trying to get > fromfile to be useful for me. > > >> +1 (much preferrable to insert NaN or other user value than raise >> ValueError in my opinion) > > But raise an error for integer types? > > I guess this is still up the air -- no consensus yet.
raise an exception, I hate the silent cast of nan to integer zero, too much debugging and useless if there are real zeros. (or use some -999 kind of thing if user defined nan codes are allowed, but I just work with float if I expect nans/missing values.) Josef > > Thanks, > > -Chris > > > > > > > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion