Jeronimo Pellegrini <[email protected]> writes: > http://aleph0.info/scheme/ > > The times I listed before are for 100000 repetitions on small matrices > (3x4, 4x6), so as to alsoinclude function call overhead in the > benchmark. > > I have uploaded two 100x100 random matrices also, and the results for > 20 repetitions on them (results.txt). > > I understand that micro-benchmarks like this are usually not > significative, but in this case they make some sense, since it's the > kind of thing my programs will do most of the time.
OK, there are lots of things going on here :) The first is that you're using a naive multiplication algorithm - there are faster algorithms, and algorithms that take L1 cache consideration into account for very large matrices, and BLAS does all of this in addition to being written in highly tuned Fortran. If matrix operations are really what you want to do, then as Ivan says just use BLAS :) If you were more curious about the speed of Scheme compilers for their own sake, and not about actually getting work done, then there are several reasons for the slowness. The first is that SRFI-25 is inherently slow - the design makes it difficult to implement efficiently, and so that's slowing down all of the Scheme implementations. It's easy enough to just implement your own matrices on top of vectors for a huge speed boost. The next problem is that presumably you want to test floating point, although the test case you use involves only fixnums. Floating point involves heap allocation for every operation in I think all Scheme implementations except Stalin. Stalin can unbox floating point numbers if it can prove all of the types involved are inexact (it wouldn't work on this example because of the general READ, you'd have to tweak it so that Stalin's type inference would kick in). This makes Scheme in general unsuited to floating point intensive computations. Given that you're only testing fixnums here, the -fixnum optimization gives a big boost. For more low-level tweaking, you can get a small boost out of Chicken by unifying the do loops as described here: http://lists.gnu.org/archive/html/chicken-users/2009-02/msg00050.html The attached variation of the code, with the -fixnum flag, will get this specific example within the ballpark of BLAS (0.1 seconds on my machine), but only for the small example you're using. As the matrices get larger BLAS will become increasingly faster, and Scheme will be painfully slow with inexacts. -- Alex
matrix-bench-chicken.scm
Description: Binary data
_______________________________________________ Chicken-users mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/chicken-users
