I'm copy/pasting this message to the ML with regard to previous discussion on cython-users and auto-vectorization (apparently my forwarded mail got rejected).
Perhaps an approach as listed below would be easier than to generate Fortran (and deal with the pain of linking with it, distutils compatibility, forcing the user to install a fortran compiler etc). ------------ Forwarded Message Below ------------ Hello, With regards to the discussion on the Cython mail listing regarding SSE and vectorizing I have a unfinished project which might be of interest. The project wraps the Orc compiler ( http://code.entropywave.com/projects/orc/ ) which is a simplified assembly language to create cross platform thight loop code utilizing SMID architectures. With some simple test code for sin function approximation i get a speedup of 10x the corresponding numpy functions (Single threaded). By utilizing openmp it is possible to extend this to multiple threads and gain further speedups. The code is currently just a proof of concept and feel free to adopt and extend this code if wanted. Best regards Runar Tenfjord _______________________________________________ cython-devel mailing list [email protected] http://mail.python.org/mailman/listinfo/cython-devel
