Hi all,When I first saws this problem: reading in a fixed-width text file as numbers, it struck me that you really should be able to do it, and do it well, with numpy by slicing character arrays.
I got carried away, and worked out a number of ways to do it. Lastly was a method inspired by a recent thread: "String to integer array of ASCII values", which did indeed inspire the fastest way. Here's what I have :
# my naive first attempt:
def line2array0(line, field_len):
nums = []
i = 0
while i < len(line):
nums.append(float(line[i:i+field_len]))
i += field_len
return np.array(nums)
# list comprehension
def line2array1(line, field_len):
return np.array(map(float,[line[i*field_len:(i+1)*field_len] for i
in range(len(line)/field_len)]))
# convert to a tuple, then to an 'S1' array -- no real reason to do # this, as I figured out the next way. def line2array2(line, field_len):return np.array(tuple(line), dtype = 'S1').view(dtype='S%i'%field_len).astype(np.float)
# convert directly to a string array, then break into fields.
def line2array3(line, field_len):
return np.array((line,)).view(dtype='S%i'%field_len).astype(np.float)
# use dtype-'c' instead of 'S1' -- better.
def line2array4(line, field_len):
return np.array(line,
dtype='c').view(dtype='S%i'%field_len).astype(np.float)
# and the winner is: use fromstring to go straight to a 'c' array: def line2array5(line, field_len):return np.fromstring(line, dtype='c').view(dtype='S%i'%field_len).astype(np.float)
Here are some timings: Timing with a 10 number string: List comp: 36.8073430061 convert to tuple: 57.9741871357 auto convert: 43.4103589058 char type: 46.0047719479 fromstring: 23.998103857 without float conversion: 11.4827179909So list comprehension is pretty fast, but using fromstring, and then slicing is much better. The last one is the same thing, but without the convertion from strings to float, showing that that's a big chunk of time no matter how you slice it.
Timing with a 100 number string: List comp: 163.281736135 convert to tuple: 333.081432104 auto convert: 138.934411049 char type: 279.897207975 fromstring: 121.395509005 without float conversion: 12.8342208862Interesting -- I thought a longer string would give greater advantage to fromstring approach -- but I was wrong, now the time to parse strings into floats is really washing everything else out -- so it doesn't matter much how you do it, though I'd go with either list comprehension (which is what I think is used in np.genfromtxt), or the fromstring method, which I kind of like 'cause it's numpy.
test and timing code attached. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [email protected]
test.py
Description: application/python
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
