Greetings! I'm in the middle of a major modification to the cvs head tree, and am reviewing the performance implications. Non-vanilla function calls are about to get much faster. But there are many questions remaining.
I'm doing a quick comparison of the standard benchmark results vs. sbcl. We win on quite a few, but there remain some long standing clunker areas. 1) catch/throw. (e.g. the ctak benchmark) GCL handles this via setjmp and longjmp. Right now there are way too many function calls in the implementation, but in my tree, using the new dladdress external function pointer mechanism, I can get this down to one a piece for throw and catch. This appears commensurate with the sbcl disassembly. We remain significantly slower even with these changes. A big piece is likely that we return the values on the lisp stack instead of the C stack. It would be possible to make use of setjmp's int return on 32bit machines only to return the primary value. Is there no analog on 64bit? Are there any other secrets to the sbcl implementation which might carry over to C? For this one benchmark, it is possible for the compiler to optimize away the catch/throw and simply emulate with a return (see below). I don't think this is what sbcl is doing. (defun ctak (x y z) (catch 'ctak (ctak-aux x y z))) (defun ctak-aux (x y z) (declare (fixnum x y z)) (cond ((not (< y x)) (throw 'ctak z)) (t (ctak-aux (catch 'ctak (ctak-aux (the fixnum (1- x)) y z)) (catch 'ctak (ctak-aux (the fixnum (1- y)) z x)) (catch 'ctak (ctak-aux (the fixnum (1- z)) x y)))))) 2) list cache prefetching -- sbcl does not show any explicit prefetch instructions in its disassembly of #'length, yet contiguous lists are accessed much faster than randomly located lists, whereas our performance is even with the random performance in both cases. I tried putting in explicit prefetch calls some time ago without much benefit -- this is awfully fiddly stuff which varies significantly with machine. I suspect this is due to non-optimal data compression in the GC algorithm. ... Just ran a test where sorting the list by cons address is very fast without prefetch confirming this hunch. Does anyone know of a good succinct exposition of copying and or compressing gc algorithms? Some time ago we discussed a 'shrink-wrap' gc idea for gcl -- right now our memory pages are grow only. Take care, -- Camm Maguire c...@maguirefamily.org ========================================================================== "The earth is but one country, and mankind its citizens." -- Baha'u'llah _______________________________________________ Gcl-devel mailing list Gcl-devel@gnu.org http://lists.gnu.org/mailman/listinfo/gcl-devel