On Tue, Mar 6, 2012 at 4:45 PM, Chris Barker <chris.bar...@noaa.gov> wrote:
> On Thu, Mar 1, 2012 at 10:58 PM, Jay Bourque <jayv...@gmail.com> wrote: > > > 1. Loading text files using loadtxt/genfromtxt need a significant > > performance boost (I think at least an order of magnitude increase in > > performance is very doable based on what I've seen with Erin's recfile > code) > > > 2. Improved memory usage. Memory used for reading in a text file > shouldn’t > > be more than the file itself, and less if only reading a subset of file. > > > 3. Keep existing interfaces for reading text files (loadtxt, genfromtxt, > > etc). No new ones. > > > 4. Underlying code should keep IO iteration and transformation of data > > separate (awaiting more thoughts from Travis on this). > > > 5. Be able to plug in different transformations of data at low level > (also > > awaiting more thoughts from Travis). > > > 6. memory mapping of text files? > > > 7. Eventually reduce memory usage even more by using same object for > > duplicate values in array (depends on implementing enum dtype?) > > > Anything else? > > Yes -- I'd like to see the solution be able to do high -performance > reads of a portion of a file -- not always the whole thing. I seem to > have a number of custom text files that I need to read that are laid > out in chunks: a bit of a header, then a block of number, another > header, another block. I'm happy to read and parse the header sections > with pure pyton, but would love a way to read the blocks of numbers > into a numpy array fast. This will probably come out of the box with > any of the proposed solutions, as long as they start at the current > position of a passes-in fiel object, and can be told how much to read, > then leave the file pointer in the correct position. > > If you are setup with Cython to build extension modules, and you don't mind testing an unreleased and experimental reader, you can try the text reader that I'm working on: https://github.com/WarrenWeckesser/textreader You can read a file like this, where the first line gives the number of rows of the following array, and that pattern repeats: 5 1.0, 2.0, 3.0 4.0, 5.0, 6.0 7.0, 8.0, 9.0 10.0, 11.0, 12.0 13.0, 14.0, 15.0 3 1.0, 1.5, 2.0, 2.5 3.0, 3.5, 4.0, 4.5 5.0, 5.5, 6.0, 6.5 1 1.0D2, 1.25D-1, 6.25D-2, 99 with code like this: import numpy as np from textreader import readrows filename = 'data/multi.dat' f = open(filename, 'r') line = f.readline() while len(line) > 0: nrows = int(line) a = readrows(f, np.float32, numrows=nrows, sci='D', delimiter=',') print "a:" print a print line = f.readline() Warren
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion