On 30.08.2011, at 6:21PM, Chris.Barker wrote: >> I've submitted a pull request for a new method for loading data from >> text files into a record array/masked record array. > >> Click on the link for more info, but the general idea is to create a >> regular expression for what entries should look like and loop over the >> file, updating the regular expression if it's wrong. Once the types >> are determined the file is loaded line by line into a pre-allocated >> numpy array. > > nice stuff. > > Have you looked at my "accumulator" class, rather than pre-allocating? > Less the class itself than that ideas behind it. It's easy enough to do, > and would keep you from having to run through the file twice. The cost > of memory re-allocation as the array grows is very small. > > I've posted the code recently, but let me know if you want it again.
I agree it would make a very nice addition, and could complement my pre-allocation option for loadtxt - however there I've also been made aware that this approach breaks streamed input etc., so the buffer.resize(…) methods in accumulator would be the better way to go. For load table this is not quite as straightforward, though, because the type auto-detection, strictly done, requires to scan the entire input, because a column full of int could still produce a float in the last row… I'd say one just has to accept that this kind of auto-detection is incompatible with input streams, and with the necessity to scan the entire data first anyway, pre-allocating the array makes sense as well. For better consistency with what people have likely got used to from npyio, I'd recommend some minor changes: make spaces the default delimiter enable automatic decompression (given the modularity, could you simply use np.lib._datasource.open() like genfromtxt?) Cheers, Derek -- ---------------------------------------------------------------- Derek Homeier Centre de Recherche Astrophysique de Lyon ENS Lyon 46, Allée d'Italie 69364 Lyon Cedex 07, France +33 1133 47272-8894 ---------------------------------------------------------------- _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion