Hi all,

When I first saws this problem: reading in a fixed-width text file as numbers, it struck me that you really should be able to do it, and do it well, with numpy by slicing character arrays.

I got carried away, and worked out a number of ways to do it. Lastly was a method inspired by a recent thread: "String to integer array of ASCII values", which did indeed inspire the fastest way. Here's what I have :

# my naive first attempt:
def line2array0(line, field_len):
    nums = []
    i = 0
    while i < len(line):
        nums.append(float(line[i:i+field_len]))
        i += field_len
    return np.array(nums)

# list comprehension
def line2array1(line, field_len):
return np.array(map(float,[line[i*field_len:(i+1)*field_len] for i in range(len(line)/field_len)]))

# convert to a tuple, then to an 'S1' array -- no real reason to do
# this, as I figured out the next way.
def line2array2(line, field_len):
return np.array(tuple(line), dtype = 'S1').view(dtype='S%i'%field_len).astype(np.float)

# convert directly to a string array, then break into fields.
def line2array3(line, field_len):
    return np.array((line,)).view(dtype='S%i'%field_len).astype(np.float)

# use dtype-'c' instead of 'S1' -- better.
def line2array4(line, field_len):
return np.array(line, dtype='c').view(dtype='S%i'%field_len).astype(np.float)

# and the winner is: use fromstring to go straight to a 'c' array:
def line2array5(line, field_len):
return np.fromstring(line, dtype='c').view(dtype='S%i'%field_len).astype(np.float)

Here are some timings:

Timing with a 10 number string:
List comp: 36.8073430061
convert to tuple: 57.9741871357
auto convert: 43.4103589058
char type: 46.0047719479
fromstring: 23.998103857
without float conversion: 11.4827179909

So list comprehension is pretty fast, but using fromstring, and then slicing is much better. The last one is the same thing, but without the convertion from strings to float, showing that that's a big chunk of time no matter how you slice it.

Timing with a 100 number string:
List comp: 163.281736135
convert to tuple: 333.081432104
auto convert: 138.934411049
char type: 279.897207975
fromstring: 121.395509005
without float conversion: 12.8342208862


Interesting -- I thought a longer string would give greater advantage to fromstring approach -- but I was wrong, now the time to parse strings into floats is really washing everything else out -- so it doesn't matter much how you do it, though I'd go with either list comprehension (which is what I think is used in np.genfromtxt), or the fromstring method, which I kind of like 'cause it's numpy.

test and timing code attached.

-Chris







--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[email protected]

Attachment: test.py
Description: application/python

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to