Hi, Felix! One of my programs was taking way too long to execute something that should run relatively fast. After some experimentation, I noticed that working with hashes that grow to a considerable size is unbearably slow, as the following example, hashing a million elements, shows:
> (use srfi-1) > (define hash (make-hash-table)) > (for-each (lambda (n) (hash-table-set! hash hash n)) (iota 1000000)) You can compile this and notice that it takes *way* too long to run. As I write this, it's been running for over 1 1/2 hours in my relatively fast machine. Whenever a hash table gets full, Chicken will increase its size. If the size is bigger than 2521, the algorithm used to determine the new size seems to be to add 101 to its current size. That means that for big hashes (as the one in the above example), a *lot* of resizing will take place. I am attaching a patch I made (against Chicken 2.0) that improves the algorithm; after applying it this program runs in around 10 seconds! That's at least (asuming my program without the patch was about to finish when I aborted it) 540 times faster (and for larger datasets the improvement would only grow). Here is what I did. I rewrote the list of predefined sizes. It now starts in 307 (rather than in 7 * 43) and each element is the smallest prime at least twice as big as the previous in the list. I did this until I got to 650680571, after which I added 1073741823, the largest fixnum we can represent. To expand a hash, Chicken will now use the smallest element in the list at least twice as large as the current size of the hash (or the largest fixnum if this condition doesn't hold for any element in the list). Feel free to apply this patch to the official branch. Alejo. http://azul.freaks-unidos.net/ P.s.: I am also attaching the program I used to calculate the list of predefined sizes, just in case its useful. It runs so fast (finishing in 0.027s in my machine) I was tempted to include it directly in extras.scm, calculating the predefined sizes dynamically at startup. ---=( Comunidad de Usuarios de Software Libre en Colombia )=--- ---=( http://bachue.com/colibri )=--=( [EMAIL PROTECTED] )=---
--- chicken-2.0-orig/extras.scm 2005-07-12 05:48:49.000000000 -0500
+++ chicken-2.0.alejo/extras.scm 2005-07-28 12:15:42.000000000 -0500
@@ -1396,9 +1396,18 @@
;;; Utility definitions:
-(define hashtab-default-size 301)
+(define hashtab-default-size 307)
(define hashtab-threshold 0.5)
-(define hashtab-primes-table '(301 613 997 1597 2011 2521 3001))
+
+; Predefined sizes for the hash tables:
+;
+; Start in 307; each element is the smallest prime that is at least twice as
+; bigger as the previous element in the list. The last number is an
+; exception: it is the largest fixnum we can repressent.
+
+(define hashtab-primes-table '(307 617 1237 2477 4957 9923 19853 39709 79423
158849 317701 635413 1270849 2541701 5083423 10166857 20333759 40667527
81335063 162670129 325340273 650680571 1073741823))
+
+(define hashtab-max-size 1073741823)
(define (hash-table? x) (##sys#structure? x 'hash-table))
@@ -1508,6 +1517,12 @@
(##sys#check-structure ht 'hash-table 'hash-table-ref)
(not (eq? unique (ref ht key unique))) ) ) )
+(define (hash-new-len tab req)
+ (if (or (fx>= (##sys#slot tab 0) req)
+ (eq? (##sys#slot tab 1) '()))
+ (##sys#slot tab 0)
+ (hash-new-len (##sys#slot tab 1) req)))
+
(define hash-table-update!
;; This one was suggested by Sven Hartrumpf.
(let ([eq0 eq?]
@@ -1521,16 +1536,12 @@
(test (##sys#slot ht 3))
(k (hashf key len))
(c (fx+ (##sys#slot ht 2) 1)) )
- (if (fx>= c (inexact->exact (floor (* len hashtab-threshold))))
- (let* ((newlen
- (cond ((memq len hashtab-primes-table)
- => (lambda (n)
- (let ((next (##sys#slot n 1)))
- (if (eq? next '())
- (fx+ len 101) ; arbitrary
- (##sys#slot next 0) ) ) ) )
- (else (fx+ len 101)) ) )
- (vec2 (make-vector newlen '())) )
+ (if (and (fx>= c (inexact->exact (floor (* len hashtab-threshold))))
+ (fx< len hashtab-max-size))
+ (let ((vec2 (make-vector
+ (hash-new-len hashtab-primes-table
+ (min hashtab-max-size (* len 2)))
+ '())))
(hashtab-rehash vec vec2 hashf)
(##sys#setslot ht 1 vec2)
(restart) )
(define (div a b)
(zero? (remainder a b)))
(define (next start)
(let loop ((current (+ (* 2 start) 1)))
(if (prime? current)
current
(loop (+ current 1)))))
(define (prime? x)
(and (not (div x 2))
(let loop ((i 3))
(or (> (* i i) x)
(and (not (div x i))
(loop (+ i 2)))))))
(define (run start)
(display start) (newline)
(unless (= 650680571 start)
(run (next start))))
(run 307)
signature.asc
Description: Digital signature
_______________________________________________ Chicken-users mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/chicken-users
