Re: [racket-dev] Inline caching (was Re: my '312' this semester, how we compare to others)

Ryan Culpepper Wed, 04 May 2011 16:13:20 -0700

On 05/04/2011 01:57 PM, Tony Garnock-Jones wrote:

On 2011-05-04 12:04 PM, Matthias Felleisen wrote:

I still believe that the Java implementation (just under 1s without
their 'Google' contract) benefits from typed dispatches.


Maybe it does, but it's almost certain that it is benefiting from inline
caching at send sites (i.e. dynamic type information) much more than it
will be benefiting from static type information.

A quick-and-dirty comparison of raw send performance on my Mac:

Language Virtual machine Nanoseconds per send
------------------------------------------------------------
Java Hotspot 64-bit 1.6.0_24 1.4
Smalltalk Cog r2382 21
Smalltalk SqueakVM 4.2.4beta1U 122
Racket Racket v5.1 ~350

Note that Cog is a JITting VM and SqueakVM is a plain (but very well
optimised) interpreter. Both Cog and SqueakVM use a couple of levels of
method lookup cache.

A simple experiment I just performed suggests that a monomorphic inline
cache hit can reduce the time needed for a send in Racket from 350ns to
around 60ns, which is a massive win. I've attached the program I used to
measure this, FWIW. (Run it using command-line Racket, not DrRacket: I
got some *very* weird timing artifacts out of DrRacket during this
experiment!)

The question, then, is: how do we implement MICs or PICs using Racket's
macro system? Each send site needs to expand into

- a piece of global state
- a test involving that global state
- a possible update to that global state

Hypothesising some kind of (let-static) form that introduces a
lexically-scoped piece of global state,


Here's 'let-static':

  (define-syntax (let-static stx)
    (syntax-case stx ()
      [(let-static ([var rhs] ...) . body)
       (with-syntax ([(gvar ...)
                      (syntax-local-lift-values-expression
                       (length (syntax->list #'(var ...)))
                       #'(values rhs ...))])
         #'(let-syntax ([var (make-rename-transformer #'gvar)] ...)
             . body))]))

> this kind of thing might Just

Work to provide a speedup of almost six-fold on almost-monomorphic
send-heavy code:

(define-syntax cached-send
  (syntax-rules ()
    ((_ obj msg arg ...)
     (let-static ((bc (box #f))
              (bm (box #f)))
       (let* ((tmp obj)
          (cls (object-ref tmp)))
     (if (eq? (unbox bc) cls)
         ((unbox bm) tmp arg ...)
         (let-values (((method _)
                           (find-method/who 'send tmp 'msg)))
           (set-box! bc cls)
           (set-box! bm method)
           (method tmp arg ...))))))))

That code has a space-safety problem: the caches might keep aroundclasses that should be garbage collected. It also has a race condition:there's a period of time when the class cache has been updated but themethod cache hasn't.

The safe-for-space issue can be fixed by using weak boxes. The racecondition could be fixed by using one instead of two... but there's nostructure that combines a class and a particular method implementationthat's strongly held elsewhere. Creating a new pair for the weak box tohold works, but it means that every GC is likely to flush the cache, butmaybe that's okay. It also means that every cache miss triggers heapallocation, but I don't know any way around that.

I've attached an updated benchmark script. Still looks like a win onspeed, but slightly less than before.


Ryan

#lang racket
(require racket/private/class-internal)

(define-syntax (let-static stx)
  (syntax-case stx ()
    [(let-static ([var rhs] ...) . body)
     (with-syntax ([(gvar ...)
                    (syntax-local-lift-values-expression
                     (length (syntax->list #'(var ...)))
                     #'(values rhs ...))])
       #'(let-syntax ([var (make-rename-transformer #'gvar)] ...)
           . body))]))

;; An ordinary Racket class.
(define a%
  (class* object% ()
    (super-new)
    (define/public (op x) (+ x 1))))
;; ==== Cached versions of send ====

;; (Real version should base cache on class, not object.)

;; send/cache0
;; Uses 2 vars.
;; Problem: not safe for space!
;; Problem: race condition in 2 set!s.
(define-syntax send/cache0
  (syntax-rules ()
    [(_ obj-expr msg arg ...)
     (let-static ([*memo-class* #f]
                  [*memo-method* #f])
       (let ([obj obj-expr])
         (let ([f (if (eq? obj *memo-class*)
                      *memo-method*
                      (let-values ([(method _obj)
                                    (find-method/who 'send obj 'msg)])
                        (set! *memo-class* obj)
                        (set! *memo-method* method)
                        method))])
           (f obj arg ...))))]))

;; send/cache1
;; Uses 2 weak-boxes to be safe for space.
;; Problem: race condition in 2 set!s.
(define-syntax send/cache1
  (syntax-rules ()
    ((_ obj-expr msg arg ...)
     (let-static ([*memo-class* (make-weak-box #f)]
                  [*memo-method* (make-weak-box #f)])
       (let* ([obj obj-expr]
              [memo-class (weak-box-value *memo-class*)]
              [memo-method (weak-box-value *memo-method*)])
         (let ([f (if (eq? obj memo-class)
                      memo-method
                      (let-values ([(method _obj)
                                    (find-method/who 'send obj 'msg)])
                        (set! *memo-class* (make-weak-box obj))
                        (set! *memo-method* (make-weak-box method))
                        method))])
           (f obj arg ...)))))))

;; send/cache2
;; Uses one weak-box instead of two, so safe for space and eliminates
;; race condition.
;; Problem (minor): the weak-box's pair is held nowhere else, so the
;; cache is potentially emptied on every (major?) GC. (But that's ok.)
(define-syntax send/cache2
  (syntax-rules ()
    ((_ obj-expr msg arg ...)
     (let-static ([*memo* (make-weak-box #f)])
       (let* ([obj obj-expr]
              [memo (weak-box-value *memo*)]
              [memo-class (and (pair? memo) (car memo))]
              [memo-method (and (pair? memo) (cdr memo))])
         (let ([f (if (eq? obj memo-class)
                      memo-method
                      (let-values ([(method _obj)
                                    (find-method/who 'send obj 'msg)])
                        (set! *memo*
                              (make-weak-box (cons obj method)))
                        method))])
           (f obj arg ...)))))))

;; Test objects.
(define a0 (new a%))

;; Syntax: (measure-ns exp)
;;
;; Expands to an expression that repeats "exp" NREPEATS times,
;; measuring the elapsed time, and returns the number of nanoseconds
;; of CPU time used *per iteration*, excluding any GC time.
(define NREPEATS 5000000)
(define-syntax measure-ns
  (syntax-rules ()
    ((_ exp)
     (call-with-values (lambda ()
                         (pretty-print `(measuring exp))
                         (time-apply (lambda ()
                                       (do ((i 0 (+ i 1)))
                                         ((= i NREPEATS))
                                         exp))
                                     '()))
                       (lambda (results cpu real gc)
                         (/ (* 1000000000.0 (/ (- cpu gc) 1000.0))
                            NREPEATS))))))

;; Main program.

;; Measure the time for a null measure-ns loop first, then measure the
;; operations of interest, subtracting the null-time overhead
;; measurement from each to get an estimate of the time taken for the
;; interesting operation.

(let ((null-time (measure-ns 123)))
  (define (report-on t)
    (let ((name (first t))
          (ns/op (second t)))
      (write (list name (- ns/op null-time)))
      (newline)))
  (for-each report-on
            `(
              ;; Report on the loop overhead for sanity checking.
              (constant ,null-time)

              ;; How long does a plain Scheme addition operation take?
              (simple-add
               ,(measure-ns (+ 123 12)))

              ;; How long does a regular Racket object send take?
              (normal-send
               ,(measure-ns (send a0 op 123)))

              ;; What about an approximation to a monomorphic inline
              ;; cache for the Racket object system? This should be
              ;; much faster than plain old send.

              ;; 2 vars (not SFS, race condition)
              (memoized-send/0
               ,(measure-ns (send/cache0 a0 op 123)))

              ;; 2 weak-boxes (SFS, race condition)
              (memoized-send/1
               ,(measure-ns (send/cache1 a0 op 123)))

              ;; 1 weak-box (SFS, no race condition)
              (memoized-send/2
               ,(measure-ns (send/cache2 a0 op 123)))
              )))

_________________________________________________
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

Re: [racket-dev] Inline caching (was Re: my '312' this semester, how we compare to others)

Reply via email to