[racket-dev] Inline caching (was Re: my '312' this semester, how we compare to others)

Tony Garnock-Jones Wed, 04 May 2011 12:58:14 -0700

On 2011-05-04 12:04 PM, Matthias Felleisen wrote:

I still believe that the Java implementation (just under 1s without
their 'Google' contract) benefits from typed dispatches.

Maybe it does, but it's almost certain that it is benefiting from inlinecaching at send sites (i.e. dynamic type information) much more than itwill be benefiting from static type information.


A quick-and-dirty comparison of raw send performance on my Mac:

  Language     Virtual machine          Nanoseconds per send
 ------------------------------------------------------------
  Java         Hotspot 64-bit 1.6.0_24      1.4
  Smalltalk    Cog r2382                   21
  Smalltalk    SqueakVM 4.2.4beta1U       122
  Racket       Racket v5.1               ~350

Note that Cog is a JITting VM and SqueakVM is a plain (but very welloptimised) interpreter. Both Cog and SqueakVM use a couple of levels ofmethod lookup cache.

A simple experiment I just performed suggests that a monomorphic inlinecache hit can reduce the time needed for a send in Racket from 350ns toaround 60ns, which is a massive win. I've attached the program I used tomeasure this, FWIW. (Run it using command-line Racket, not DrRacket: Igot some *very* weird timing artifacts out of DrRacket during thisexperiment!)

The question, then, is: how do we implement MICs or PICs using Racket'smacro system? Each send site needs to expand into


 - a piece of global state
 - a test involving that global state
 - a possible update to that global state

Hypothesising some kind of (let-static) form that introduces alexically-scoped piece of global state, this kind of thing might JustWork to provide a speedup of almost six-fold on almost-monomorphicsend-heavy code:


(define-syntax cached-send
  (syntax-rules ()
    ((_ obj msg arg ...)
     (let-static ((bc (box #f))
                  (bm (box #f)))
       (let* ((tmp obj)
              (cls (object-ref tmp)))
         (if (eq? (unbox bc) cls)
             ((unbox bm) tmp arg ...)
             (let-values (((method _)
                           (find-method/who 'send tmp 'msg)))
               (set-box! bc cls)
               (set-box! bm method)
               (method tmp arg ...))))))))

Regards,
  Tony

#lang racket
(require racket/private/class-internal)

;; An ordinary Racket class.
(define a%
  (class* object% ()
    (super-new)
    (define/public (op x) (+ x 1))))

;; Representation of a trivial vtable.
(struct ob (vt state) #:transparent)

;; A simple vtable providing a single method named "op".
(define (b%-vt selector)
  (case selector
    ((op) (lambda (self x) (+ x 2)))
    (else (error 'dnu))))

;; A simple class, using b%-vt as its behaviour.
(define (b%)
  (ob b%-vt 'no-state))

;; An uncached send to a struct ob.
(define-syntax unmemo-send
  (syntax-rules ()
    ((_ obj msg arg ...)
     (let ((tmp obj))
       (((ob-vt tmp) 'msg) tmp arg ...)))))

;; A quasi-cached send to a struct ob.
;;
;; A real cache would have per-send-site state rather than a single
;; (!) global variable.
(define *memo-class* #f)
(define *memo-method* #f)
(define-syntax memo-send
  (syntax-rules ()
    ((_ obj msg arg ...)
     (let* ((tmp obj)
            (cls (ob-vt tmp)))
       (if (eq? *memo-class* cls)
           (*memo-method* tmp arg ...)
           (let ((method (cls 'msg)))
             (set! *memo-class* cls)
             (set! *memo-method* method)
             (method tmp arg ...)))))))

;; Test objects.
(define a0 (new a%))
(define b0 (b%))

;; Syntax: (measure-ns exp)
;;
;; Expands to an expression that repeats "exp" NREPEATS times,
;; measuring the elapsed time, and returns the number of nanoseconds
;; of CPU time used *per iteration*, excluding any GC time.
(define NREPEATS 5000000)
(define-syntax measure-ns
  (syntax-rules ()
    ((_ exp)
     (call-with-values (lambda ()
                         (pretty-print `(measuring exp))
                         (time-apply (lambda ()
                                       (do ((i 0 (+ i 1)))
                                         ((= i NREPEATS))
                                         exp))
                                     '()))
                       (lambda (results cpu real gc)
                         (/ (* 1000000000.0 (/ (- cpu gc) 1000.0))
                            NREPEATS))))))

;; Main program.

;; Measure the time for a null measure-ns loop first, then measure the
;; operations of interest, subtracting the null-time overhead
;; measurement from each to get an estimate of the time taken for the
;; interesting operation.

(let ((null-time (measure-ns 123)))
  (define (report-on t)
    (let ((name (first t))
          (ns/op (second t)))
      (write (list name (- ns/op null-time)))
      (newline)))
  (for-each report-on
            `(
              ;; Report on the loop overhead for sanity checking.
              (constant ,null-time)

              ;; How long does a plain Scheme addition operation take?
              (simple-add
               ,(measure-ns (+ 123 12)))

              ;; How long does a regular Racket object send take?
              (normal-send
               ,(measure-ns (send a0 op 123)))

              ;; What about if we expand the send macro in place?
              ;; This should be almost identical to the time for the
              ;; previous expression.
              (expanded-normal-send
               ,(measure-ns (let-values (((temp1) 'op))
                              (let-values (((temp2 temp3)
                                            (find-method/who 'send a0 temp1)))
                                (temp2 temp3 '123)))))

              ;; What about an approximation to a monomorphic inline
              ;; cache for the Racket object system? This should be
              ;; much faster than plain old send.
              (quasi-memoized-normal-send
               ,(with-method ((a-op (a0 op)))
                             (let ((method (lambda (x) (a-op x))))
                               (measure-ns (if (eq? *memo-class* a0)
                                               (*memo-method* 123)
                                               (begin
                                                 (set! *memo-class* a0)
                                                 (set! *memo-method* method)
                                                 (method 123)))))))

              ;; What about an uncached lookup using the trivial
              ;; vtable format defined above?
              (unmemoized-simple-lookup
               ,(measure-ns (unmemo-send b0 op 123)))

              ;; Finally, the vtable format defined above using an
              ;; approximation of monomorphic inline caching.
              (quasi-memoized-simple-lookup
               ,(measure-ns (memo-send b0 op 123)))

              )))

_________________________________________________
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

[racket-dev] Inline caching (was Re: my '312' this semester, how we compare to others)

Reply via email to