Stéfan van der Walt wrote: > Hi Pierre > > 2008/12/1 Pierre GM <[EMAIL PROTECTED]>: >> * `genloadtxt` is the base function that makes all the work. It >> outputs 2 arrays, one for the data (missing values being substituted >> by the appropriate default) and one for the mask. It would go in >> np.lib.io > > I see the code length increased from 200 lines to 800. This made me > wonder about the execution time: initial benchmarks suggest a 3x > slow-down. Could this be a problem for loading large text files? If > so, should we consider keeping both versions around, or by default > bypassing all the extra hooks?
I've wondered about this being an issue. On one hand, you hate to make existing code noticeably slower. On the other hand, if speed is important to you, why are you using ascii I/O? I personally am not entirely against having two versions of loadtxt-like functions. However, the idea seems a little odd, seeing as how loadtxt was already supposed to be the "swiss army knife" of text reading. I'm seeing a similar slowdown with Pierre's version of the code. The version of loadtxt that I cobbled together with the StringConverter class (and no missing value support) shows about a 50% slowdown, so clearly there's a performance penalty for trying to make a generic function that can be all things to all people. On the other hand, this approach reduces code duplication. I'm not really opinionated on what the right approach is here. My only opinion is that this functionality *really* needs to be in numpy in some fashion. For my own use case, with the old version, I could read a text file and by hand separate out columns and mask values. Now, I open a file and get a structured array with an automatically detected dtype (names and types!) plus masked values. My $0.02. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion