On 2011-05-04 12:04 PM, Matthias Felleisen wrote:
I still believe that the Java implementation (just under 1s without
their 'Google' contract) benefits from typed dispatches.
Maybe it does, but it's almost certain that it is benefiting from inline
caching at send sites (i.e. dynamic type information) much more than it
will be benefiting from static type information.
A quick-and-dirty comparison of raw send performance on my Mac:
Language Virtual machine Nanoseconds per send
------------------------------------------------------------
Java Hotspot 64-bit 1.6.0_24 1.4
Smalltalk Cog r2382 21
Smalltalk SqueakVM 4.2.4beta1U 122
Racket Racket v5.1 ~350
Note that Cog is a JITting VM and SqueakVM is a plain (but very well
optimised) interpreter. Both Cog and SqueakVM use a couple of levels of
method lookup cache.
A simple experiment I just performed suggests that a monomorphic inline
cache hit can reduce the time needed for a send in Racket from 350ns to
around 60ns, which is a massive win. I've attached the program I used to
measure this, FWIW. (Run it using command-line Racket, not DrRacket: I
got some *very* weird timing artifacts out of DrRacket during this
experiment!)
The question, then, is: how do we implement MICs or PICs using Racket's
macro system? Each send site needs to expand into
- a piece of global state
- a test involving that global state
- a possible update to that global state
Hypothesising some kind of (let-static) form that introduces a
lexically-scoped piece of global state, this kind of thing might Just
Work to provide a speedup of almost six-fold on almost-monomorphic
send-heavy code:
(define-syntax cached-send
(syntax-rules ()
((_ obj msg arg ...)
(let-static ((bc (box #f))
(bm (box #f)))
(let* ((tmp obj)
(cls (object-ref tmp)))
(if (eq? (unbox bc) cls)
((unbox bm) tmp arg ...)
(let-values (((method _)
(find-method/who 'send tmp 'msg)))
(set-box! bc cls)
(set-box! bm method)
(method tmp arg ...))))))))
Regards,
Tony
#lang racket
(require racket/private/class-internal)
;; An ordinary Racket class.
(define a%
(class* object% ()
(super-new)
(define/public (op x) (+ x 1))))
;; Representation of a trivial vtable.
(struct ob (vt state) #:transparent)
;; A simple vtable providing a single method named "op".
(define (b%-vt selector)
(case selector
((op) (lambda (self x) (+ x 2)))
(else (error 'dnu))))
;; A simple class, using b%-vt as its behaviour.
(define (b%)
(ob b%-vt 'no-state))
;; An uncached send to a struct ob.
(define-syntax unmemo-send
(syntax-rules ()
((_ obj msg arg ...)
(let ((tmp obj))
(((ob-vt tmp) 'msg) tmp arg ...)))))
;; A quasi-cached send to a struct ob.
;;
;; A real cache would have per-send-site state rather than a single
;; (!) global variable.
(define *memo-class* #f)
(define *memo-method* #f)
(define-syntax memo-send
(syntax-rules ()
((_ obj msg arg ...)
(let* ((tmp obj)
(cls (ob-vt tmp)))
(if (eq? *memo-class* cls)
(*memo-method* tmp arg ...)
(let ((method (cls 'msg)))
(set! *memo-class* cls)
(set! *memo-method* method)
(method tmp arg ...)))))))
;; Test objects.
(define a0 (new a%))
(define b0 (b%))
;; Syntax: (measure-ns exp)
;;
;; Expands to an expression that repeats "exp" NREPEATS times,
;; measuring the elapsed time, and returns the number of nanoseconds
;; of CPU time used *per iteration*, excluding any GC time.
(define NREPEATS 5000000)
(define-syntax measure-ns
(syntax-rules ()
((_ exp)
(call-with-values (lambda ()
(pretty-print `(measuring exp))
(time-apply (lambda ()
(do ((i 0 (+ i 1)))
((= i NREPEATS))
exp))
'()))
(lambda (results cpu real gc)
(/ (* 1000000000.0 (/ (- cpu gc) 1000.0))
NREPEATS))))))
;; Main program.
;; Measure the time for a null measure-ns loop first, then measure the
;; operations of interest, subtracting the null-time overhead
;; measurement from each to get an estimate of the time taken for the
;; interesting operation.
(let ((null-time (measure-ns 123)))
(define (report-on t)
(let ((name (first t))
(ns/op (second t)))
(write (list name (- ns/op null-time)))
(newline)))
(for-each report-on
`(
;; Report on the loop overhead for sanity checking.
(constant ,null-time)
;; How long does a plain Scheme addition operation take?
(simple-add
,(measure-ns (+ 123 12)))
;; How long does a regular Racket object send take?
(normal-send
,(measure-ns (send a0 op 123)))
;; What about if we expand the send macro in place?
;; This should be almost identical to the time for the
;; previous expression.
(expanded-normal-send
,(measure-ns (let-values (((temp1) 'op))
(let-values (((temp2 temp3)
(find-method/who 'send a0 temp1)))
(temp2 temp3 '123)))))
;; What about an approximation to a monomorphic inline
;; cache for the Racket object system? This should be
;; much faster than plain old send.
(quasi-memoized-normal-send
,(with-method ((a-op (a0 op)))
(let ((method (lambda (x) (a-op x))))
(measure-ns (if (eq? *memo-class* a0)
(*memo-method* 123)
(begin
(set! *memo-class* a0)
(set! *memo-method* method)
(method 123)))))))
;; What about an uncached lookup using the trivial
;; vtable format defined above?
(unmemoized-simple-lookup
,(measure-ns (unmemo-send b0 op 123)))
;; Finally, the vtable format defined above using an
;; approximation of monomorphic inline caching.
(quasi-memoized-simple-lookup
,(measure-ns (memo-send b0 op 123)))
)))
_________________________________________________
For list-related administrative tasks:
http://lists.racket-lang.org/listinfo/dev