Okay, so, now that I've read the C code *carefully* (sigh), here's my
direct translation of it into Lisp:
(let ((n 1000)
(nn 1000000)
(mat #.(make-array '(1000000) :element-type 'double-float)))
(declare (type (eql 1000) n)
(type (eql 1000000) nn)
(type (array double-float (1000000)) mat)
(optimize (speed 3) (safety 0) (debug 0) (space 0) (compilation-speed 0)))
(defun test ()
(macrolet ((mat (n)
`(aref mat ,n)))
(dotimes (i nn)
(setf (mat i) 1.0d0))
(dotimes (i 10)
(loop
for pos1 of-type fixnum from 1 below (1- n)
do (loop
for pos2 of-type fixnum from (+ pos1 n) below (+ pos1 (* n (1- n))) by n
do (setf (mat pos2)
(* 0.11111111111111d0
(+ (mat (- pos2 1001)) (mat (1- pos2)) (mat (+ pos2 999))
(mat (- pos2 1000)) (mat pos2) (mat (+ pos2 1000))
(mat (- pos2 999)) (mat (1+ pos2)) (mat (+ pos2
1001))))))))
(princ (mat (1+ n))))))
On a Sun Blade 100, this takes about 2x as long to run as the C code
(CMUCL 18d and both gcc and the Sun compiler). I haven't looked
through the 9 pages of disassembly very carefully yet, on account of
it's 9 pages, so there might be something obvious I can do. But
unfortunately, I think the slowdown is probably a result of accessing
the array through KERNEL:%WITH-ARRAY-DATA. I also tried a version of
this code that used a (simple-array 1000000) for MAT, which had a
nice, short, easy-to-read disassembly ... of course, it had to cons to
box the floats, so it too about 10x as long to run as the C code.
So, it could be worse, but at least it's better than the 10x slowdown
you were seeing. It would be nice if Python could open-code access
into MAT. I'll look at this more tomorrow.
--
/|_ .-----------------------.
,' .\ / | No to Imperialist war |
,--' _,' | Wage class war! |
/ / `-----------------------'
( -. |
| ) |
(`-. '--.)
`. )----'