There was also some work on a semi-mutable array type that allowed appending along one axis, then 'freezing' to yield a normal numpy array (unfortunately I'm not sure how to find it in the mailing list archives). One could write such a setup by hand, using mmap() or realloc(), but I'd be inclined to simply write a filter that converted the text file to some sort of binary file on the fly, value by value. Then the file can be loaded in or mmap()ed. A 1 Gb text file is a miserable object anyway, so it might be desirable to convert to (say) HDF5 and then throw away the text file.
Anne On 10 August 2011 15:43, Derek Homeier <de...@astro.physik.uni-goettingen.de> wrote: > On 10 Aug 2011, at 19:22, Russell E. Owen wrote: > >> A coworker is trying to load a 1Gb text data file into a numpy array >> using numpy.loadtxt, but he says it is using up all of his machine's 6Gb >> of RAM. Is there a more efficient way to read such text data files? > > The npyio routines (loadtxt as well as genfromtxt) first read in the entire > data as lists, which creates of course significant overhead, but is not easy > to circumvent, since numpy arrays are immutable - so you have to first store > the numbers in some kind of mutable object. One could write a custom parser > that tries to be somewhat more efficient, e.g. first reading in sub-arrays > from a smaller buffer. Concatenating those sub-arrays would still require > about twice the memory of the final array. I don't know if using the > array.array type (which is mutable) is much more efficient than a list... > To really avoid any excess memory usage you'd have to know the total data > size in advance - either by reading in the file in a first pass to count the > rows, or explicitly specifying it to a custom reader. Basically, assuming a > completely regular file without missing values etc., you could then read in > the data like > > X = np.zeros((n_lines, n_columns), dtype=float) > delimiter = ' ' > for n, line in enumerate(file(fname, 'r')): > X[n] = np.array(line.split(delimiter), dtype=float) > > (adjust delimiter and dtype as needed...) > > HTH, > Derek > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion