Hi, Felix!

One of my programs was taking way too long to execute something that
should run relatively fast.  After some experimentation, I noticed
that working with hashes that grow to a considerable size is
unbearably slow, as the following example, hashing a million elements,
shows:

> (use srfi-1)
> (define hash (make-hash-table))
> (for-each (lambda (n) (hash-table-set! hash hash n)) (iota 1000000))

You can compile this and notice that it takes *way* too long to run.
As I write this, it's been running for over 1 1/2 hours in my
relatively fast machine.

Whenever a hash table gets full, Chicken will increase its size.  If
the size is bigger than 2521, the algorithm used to determine the new
size seems to be to add 101 to its current size.  That means that for
big hashes (as the one in the above example), a *lot* of resizing will
take place.

I am attaching a patch I made (against Chicken 2.0) that improves the
algorithm; after applying it this program runs in around 10 seconds!
That's at least (asuming my program without the patch was about to
finish when I aborted it) 540 times faster (and for larger datasets
the improvement would only grow).

Here is what I did.  I rewrote the list of predefined sizes.  It now
starts in 307 (rather than in 7 * 43) and each element is the smallest
prime at least twice as big as the previous in the list.  I did this
until I got to 650680571, after which I added 1073741823, the largest
fixnum we can represent.  To expand a hash, Chicken will now use the
smallest element in the list at least twice as large as the current
size of the hash (or the largest fixnum if this condition doesn't hold
for any element in the list).

Feel free to apply this patch to the official branch.

Alejo.
http://azul.freaks-unidos.net/

P.s.: I am also attaching the program I used to calculate the list of
predefined sizes, just in case its useful.  It runs so fast (finishing
in 0.027s in my machine) I was tempted to include it directly in
extras.scm, calculating the predefined sizes dynamically at startup.

---=(  Comunidad de Usuarios de Software Libre en Colombia  )=---
---=(  http://bachue.com/colibri )=--=( [EMAIL PROTECTED]  )=---
--- chicken-2.0-orig/extras.scm 2005-07-12 05:48:49.000000000 -0500
+++ chicken-2.0.alejo/extras.scm        2005-07-28 12:15:42.000000000 -0500
@@ -1396,9 +1396,18 @@
 
 ;;; Utility definitions:
 
-(define hashtab-default-size 301)
+(define hashtab-default-size 307)
 (define hashtab-threshold 0.5)
-(define hashtab-primes-table '(301 613 997 1597 2011 2521 3001))
+
+; Predefined sizes for the hash tables:
+;
+; Start in 307; each element is the smallest prime that is at least twice as
+; bigger as the previous element in the list.  The last number is an
+; exception: it is the largest fixnum we can repressent.
+
+(define hashtab-primes-table '(307 617 1237 2477 4957 9923 19853 39709 79423 
158849 317701 635413 1270849 2541701 5083423 10166857 20333759 40667527 
81335063 162670129 325340273 650680571 1073741823))
+
+(define hashtab-max-size 1073741823)
 (define (hash-table? x) (##sys#structure? x 'hash-table))
 
 
@@ -1508,6 +1517,12 @@
       (##sys#check-structure ht 'hash-table 'hash-table-ref)
       (not (eq? unique (ref ht key unique))) ) ) )
 
+(define (hash-new-len tab req)
+  (if (or (fx>= (##sys#slot tab 0) req)
+         (eq? (##sys#slot tab 1) '()))
+      (##sys#slot tab 0)
+      (hash-new-len (##sys#slot tab 1) req)))
+
 (define hash-table-update! 
   ;; This one was suggested by Sven Hartrumpf.
   (let ([eq0 eq?]
@@ -1521,16 +1536,12 @@
               (test (##sys#slot ht 3))
               (k (hashf key len))
               (c (fx+ (##sys#slot ht 2) 1)) )
-         (if (fx>= c (inexact->exact (floor (* len hashtab-threshold))))
-             (let* ((newlen
-                     (cond ((memq len hashtab-primes-table)
-                            => (lambda (n) 
-                                 (let ((next (##sys#slot n 1)))
-                                   (if (eq? next '())
-                                       (fx+ len 101) ; arbitrary
-                                       (##sys#slot next 0) ) ) ) )
-                           (else (fx+ len 101)) ) )
-                    (vec2 (make-vector newlen '())) )
+         (if (and (fx>= c (inexact->exact (floor (* len hashtab-threshold))))
+                  (fx< len hashtab-max-size))
+             (let ((vec2 (make-vector
+                            (hash-new-len hashtab-primes-table
+                                          (min hashtab-max-size (* len 2)))
+                            '())))
                (hashtab-rehash vec vec2 hashf)
                (##sys#setslot ht 1 vec2)
                (restart) ) 
(define (div a b)
  (zero? (remainder a b)))

(define (next start)
  (let loop ((current (+ (* 2 start) 1)))
    (if (prime? current)
      current
      (loop (+ current 1)))))

(define (prime? x)
  (and (not (div x 2))
       (let loop ((i 3))
         (or (> (* i i) x)
             (and (not (div x i))
                  (loop (+ i 2)))))))

(define (run start)
  (display start) (newline)
  (unless (= 650680571 start)
    (run (next start))))

(run 307)

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Chicken-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/chicken-users

Reply via email to