On 4/16/15 12:48 PM, Eduardo A. Bustamante López wrote: > On Thu, Apr 16, 2015 at 11:07:34AM -0400, Chet Ramey wrote: > [...] >> I knew that rang a bell somewhere. mt_hash is a function in the bash >> malloc library that keeps track of all allocations and deallocations in >> a table. It's part of the debugging that is enabled when you build from >> the devel code. It's been well-known for a long time that the debugging >> code in malloc slows bash down considerably; that's why it's not enabled >> as part of bash releases. > > Actually, this is the post that motivated me to look into this: > (yes, the conclusion is idiotic, but I guess the rest of the post is pretty > okay).
I don't know, he kind of lost me with the first sentence. Then again, the Internet has always been the home of hyperbole. > http://spencertipping.com/posts/2013.0814.bash-is-irrecoverably-broken.html > > Now, there is some truth to what he says: > > dualbus@yaqui ...src/gnu/bash % time ./bash -c 'i=0; while ((i++<1000)); do > declare a$RANDOM$RANDOM=1; done' > ./bash -c 'i=0; while ((i++<1000)); do declare a$RANDOM$RANDOM=1; done' > 0.01s user 0.06s system 93% cpu 0.077 total > dualbus@yaqui ...src/gnu/bash % time ./bash -c 'i=0; while ((i++<10000)); do > declare a$RANDOM$RANDOM=1; done' > ./bash -c 'i=0; while ((i++<10000)); do declare a$RANDOM$RANDOM=1; done' > 0.16s user 0.48s system 98% cpu 0.643 total > dualbus@yaqui ...src/gnu/bash % time ./bash -c 'i=0; while ((i++<100000)); do > declare a$RANDOM$RANDOM=1; done' > ./bash -c 'i=0; while ((i++<100000)); do declare a$RANDOM$RANDOM=1; done' > 15.44s user 6.51s system 99% cpu 21.959 total Sure, some. He apparently doesn't think that any hashing technique other than open addressing is legitimate hashing, and that kind of diminishes his arguments. He's correct about the hashing implementation: bash hash tables, which are used all over the place for variable and other storage, use separate chaining to do collision resolution. That means you're always going to have some cost associated with searching the linked list attached to each hash bucket. The bash-4.3 implementation uses 64 hash buckets, and I changed that to 128 a while back in the devel branch. That seems to be the right balance of space and lookup speed for the majority of bash scripts. I don't think it would be unreasonable to double that to something like 256 hash buckets, and it wouldn't cost any memory to speak of, but tuning bash for an unrepresentative example that doesn't actually occur in real use doesn't seem like a useful thing to do. The same thing goes for resizing the hash table, which would, of course, require creating an entirely new one and rehashing each element of the old one into the new. That's easy enough to write, and I might do it when I have a few minutes, but is the problem that is designed to solve worth even the implementation effort? How many scripts create 64K variables? > I built bash like this: > > CFLAGS='-pg -g -O0' ./configure --silent && make -sj4 DEBUG= MALLOC_DEBUG= > > To make sure the malloc debugging code doesn't interfere. > > I got a gprof profile with that last run, which gave: > > Each sample counts as 0.01 seconds. > % cumulative self self total > time seconds seconds calls s/call s/call name > 71.42 12.07 12.07 1100104 0.00 0.00 hash_search > 21.18 15.65 3.58 275435 0.00 0.00 morecore > 1.63 15.93 0.28 6800525 0.00 0.00 internal_malloc > 0.71 16.05 0.12 6200116 0.00 0.00 internal_free > 0.59 16.15 0.10 300001 0.00 0.00 expand_word_internal > 0.24 16.19 0.04 6800474 0.00 0.00 sh_xmalloc > 0.18 16.22 0.03 7203779 0.00 0.00 is_basic > 0.18 16.25 0.03 1932530 0.00 0.00 is_basic > 0.18 16.28 0.03 200002 0.00 0.00 subexpr > 0.18 16.31 0.03 100008 0.00 0.00 find_special_var > 0.15 16.33 0.03 pagealign I'm not surprised that these functions dominate the run time. You're basically doing nothing but assigning variables. You have to allocate memory for the variables, and you have to search the variable hash table to see whether or not the variable already exists. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRU c...@case.edu http://cnswww.cns.cwru.edu/~chet/