Dag Sverre Seljebotn wrote:
> Jean-Francois Moulin wrote:
>> Hi,
>>
>> I am working on a script which reads rather large amounts of data in a
>> binary format and then
>> processes it through different test functions.
>> I optimized the beast as much as I possibly could: using tuples
>> instead of lists,
>> then moving to cython and declaring the types, optimizing the calls to numpy 
>> fn
>> by use of the  buffer notation...
>>
>> All in all I gain a factor 10 in speed. Not bad but still not really 
>> enough...
>>
>> What I still see as factors slowing me down could be (see my code in attach):
>> - the use of the file.read() function from python to get a string
>> which I then process (is an fread call
>> from c faster... how to implement it?)
> 
> The real problem is that you read 4 bytes at the time. If you buffer up 
> longer stretches somehow it doesn't matter so much which call you use. I.e.:
> 
> obj = file.read(400)
> cdef char* buf = obj
> # hold on to obj, but process buf[0]..buf[399]
> buf = NULL
> obj = None # do not do this until you no longer use buf
> 
> Though if you have a socket rather than a file I suppose you're worse off.
> 
> You can use C file handling diretly (the safest thing is to open and 
> close the file/socket with C calls as well), just look up Cython 
> examples on interfacing with C code and Google for C and file handling.
> 
>> - the use of the struct.unpack
> 
> As long as you stick to native-endian, you should be able to just cast 
> to an int in your case:
> 
> cdef char* buf = data
> cdef int* buf_as_int = <int*>buf
> cdef int value = *buf_as_int

Argh, this is not Cython :-) (and the irony is we're just having a 
discussion about this on the list)

Do

cdef int value = buf_as_int[0]

instead. Or just value = (<int*>buf)[0]



-- 
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to