Consider the following program I wrote to divide a 50,000 byte string into a stack of one character per definition (this is supposed to be an O(n log n) task, because it repeatedly cuts the input in half, and should be more efficient than iterating on substr(`$1',0,1)$0(substr(`$1',1) with its O(n^2) complexity):
$ cat tmp define(`chew', `ifelse($1, 1, `pushdef(`stack', substr(`$2',0, 1))', `$0(eval($1/2), substr(`$2', 0, eval($1/2))_)$0(eval($1 - $1/2), substr(`$2', eval($1/2)))')')dnl chew(50000, format(`%050000d', 0))dnl $ time m4 tmp real 0m22.264s user 0m22.180s sys 0m0.010s $ time m4 -H517 tmp real 0m0.321s user 0m0.316s sys 0m0.004s What gives? It turns out that with the default hash size of 509, 'stack' and 'substr' hash to the same bucket, and as the pushdef depth of stack increases, the lookup time for 'substr' gets progressively worse. With -H517, the two names hash to different buckets and no longer interfere. I'm working on a patch to separate the pushdef chain from the hash bucket chain, so that hash collisions only have to visit one instance of a colliding name, not the entire stack. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org