Well, I couldn't leave this alone --- especially after I realized that some of the dispatching overhead has to do with C functions that cooperate specially with the GC for more complex cases.
I've streamlined various paths and cut about 1/3 of Racket's time on the example. At Thu, 24 Jul 2014 11:09:11 +0100, Matthew Flatt wrote: > In Racket, it looks like a lot of the time is still in generic > dispatch: directing hash operations to `equal?`-based hashing, > directing `equal?`-based hashing operation to the string-specific > operation, directing equality comparison to string-specific equality > comparison, and so on. > > I'm not sure if that's the full story, but my guess is that such paths > are more streamlined in Python, given the central role that > dictionaries play there. > > At Wed, 23 Jul 2014 16:54:47 +0100, Pedro Ramos wrote: > > Hi, > > > > I've been developing an implementation of Python in Racket, where I'm > > implementing Python's dictionaries over Racket custom hash tables. > > > > While my occasional benchmarks typically show better performance on Racket > > programs than their Python equivalents, Racket's hash tables generally seem > > to be slower than Python's dicts. > > > > I've set up this benchmark in Racket: > > > > > > #lang racket > > > > (define alphabet "abcdefghijklmnopqrstuvwxyz") > > (define (random-word n) > > (build-string n (lambda (x) (string-ref alphabet (random 23))))) > > > > (define words (for/list ([k 1000000]) > > (random-word 3))) > > (define d (make-hash)) > > > > (time (for ([w words]) > > (if (hash-has-key? d w) > > (hash-set! d w (add1 (hash-ref d w))) > > (hash-set! d w 1)))) > > > > > > And its equivalent in Python: > > > > > > import random > > import time > > > > alphabet = "abcdefghijklmnopqrstuvwxyz" > > def random_word(n): > > return ''.join([random.choice(alphabet) for i in range(n)]) > > > > words = [random_word(3) for k in xrange(1000000)] > > d = {} > > > > a = time.time() > > for w in words: > > if w in d: > > d[w] = d[w] + 1 > > else: > > d[w] = 1 > > b = time.time() > > print b-a, 'seconds' > > > > > > The Racket example yields running times of around 500 ms (running on Racket > > v6.0.1) while the Python example yields running times of around 330 ms > > (running on Python 2.7.3). > > > > I find this unusual because Python is somewhat more dynamic than Racket, > > since > > (a) Python's equality and hashing functions have to dispatched at runtime > > for each key; > > (b) referencing and setting values in a Python dict is done using a very > > general operator, [], whose behaviour also has to be dispatched at runtime, > > unlike the more specific hash-ref and hash-set! Racket functions. > > > > Is there something I'm missing about Racket's hash tables which explains > > this slower speed? > > > > Thanks, > > Pedro Ramos > > _________________________ > > Racket Developers list: > > http://lists.racket-lang.org/dev > _________________________ > Racket Developers list: > http://lists.racket-lang.org/dev _________________________ Racket Developers list: http://lists.racket-lang.org/dev