On Mon, Oct 12, 2009 at 01:04:40PM +0900, Alex Shinn wrote: > Jeronimo Pellegrini <[email protected]> writes: > > > http://aleph0.info/scheme/ > > > > The times I listed before are for 100000 repetitions on small matrices > > (3x4, 4x6), so as to alsoinclude function call overhead in the > > benchmark. > > > > I have uploaded two 100x100 random matrices also, and the results for > > 20 repetitions on them (results.txt). > > > > I understand that micro-benchmarks like this are usually not > > significative, but in this case they make some sense, since it's the > > kind of thing my programs will do most of the time. > > OK, there are lots of things going on here :) > > The first is that you're using a naive multiplication > algorithm - there are faster algorithms, and algorithms that > take L1 cache consideration into account for very large > matrices, and BLAS does all of this in addition to being > written in highly tuned Fortran.
Yes, I know -- I was just trying to compare the same numerical algorithm on different Scheme implementations. > If matrix operations are > really what you want to do, then as Ivan says just use BLAS > :) Not exactly. I do use lots of floating-point operations, but not necessarily linear algebra-style. > If you were more curious about the speed of Scheme compilers > for their own sake, and not about actually getting work > done, then there are several reasons for the slowness. The > first is that SRFI-25 is inherently slow - the design makes > it difficult to implement efficiently, and so that's slowing > down all of the Scheme implementations. It's easy enough to > just implement your own matrices on top of vectors for a > huge speed boost. THANK YOU!!! That was the problem. The test with 100x100 floating-point matrices ran in: with SRFI-25: 23.2s without it: 1.3s (Although this is suspicious -- it's >2x faster than Bigloo and Gambit!) > The next problem is that presumably you want to test > floating point, although the test case you use involves only > fixnums. Well, the small examples use fixnums, but the 100x100 example doesn't. > Floating point involves heap allocation for every > operation in I think all Scheme implementations except > Stalin. Stalin can unbox floating point numbers if it can > prove all of the types involved are inexact (it wouldn't > work on this example because of the general READ, you'd have > to tweak it so that Stalin's type inference would kick in). > This makes Scheme in general unsuited to floating point > intensive computations. > > Given that you're only testing fixnums here, the -fixnum > optimization gives a big boost. Yes, I have tested it, and then Chicken runs just like bigloo and gambit -- very fast! But I actually will need floating point -- and the problem is solved now (I just won't use SRFI 25). I don't really need the same speed of C or Fortran -- it just shouldn't be more than 15 times slower. :-) Thanks a lot! > The attached variation of the code, with the -fixnum flag, > will get this specific example within the ballpark of BLAS > (0.1 seconds on my machine), but only for the small example > you're using. As the matrices get larger BLAS will become > increasingly faster, and Scheme will be painfully slow with > inexacts. OK -- I understand that I need to test larger examples also. I'll do that. Thanks Alex (and thanks Ivan also)! J. _______________________________________________ Chicken-users mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/chicken-users
