Hi all, I was just having a new look into the mess that is, imo, the support for automatic line ending recognition in genfromtxt, and more generally, the Python file openers. I am glad at least reading gzip files is no longer entirely broken in Python3, but actually detecting in particular “old Mac” style CR line endings currently only work for uncompressed and bzip2 files under 2.6/2.7. This is largely because genfromtxt wants to open everything in binary mode, which arguably makes no sense for ASCII text files with numbers. I think the only reason this works in 2.x is that the ‘U’ reading mode overrides the ‘b’.
So on the Python side what actually works for automatic line ending detection is: Python 2.6 2.7 3.2 3.3/3.4 uncompressed: U U t t gzip: E N E t bzip2: U U E t* lzma: - - - t* U - works with mode ‘rU’ E - mode ‘rU’ raises an error N - mode ‘rU’ is accepted, but does not detect CR (‘\r’) line endings (actually I think ‘U’ is simply internally discarded by gzip.open() in 2.7.4+) t - works with mode ‘rt’ (default with plain open()) - * means requires the '.open()' rather than the '.XXXFile()' method of bz2/lzma Therefore I’d propose the changes in https://github.com/dhomeier/numpy/commit/995ec93 to extend universal newline recognition as far as possible with the above openers. There are some potential issues with this: 1. Switching to ‘rt’ mode for Python3.x means that np.lib._datasource.open() does not return byte strings by itself, so genfromtxt has to use asbytes() on the returned lines. Since this occurs only in two places, I don’t see a major problem with this. 2. In the tests I had to work around the lack of fileobj support in bz2.BZ2File by using os.system(‘bzip2 …’) on the temporary file, which might not work on all systems. In particular I’d expect it to fail under Windows, but it’s not clear to me how far the entire mkstemp thing works under Windows... As a final note, http://bugs.python.org/issue13989#msg153127 suggests a workaround that might make this work with gzip.open() (and perhaps bz2?) on 3.2 as well. I am not sure how high 3.2 support is ranking for the near future; for the moment I am not strongly inclined to implement it… Grateful for comments or tests (especially under Windows!) of the commit(s) above - Derek _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion