On Sun, May 29, 2016 at 10:27:25PM +0200, lemonboy wrote: > Hello, > While investigating the ticket #1293 [1] I noticed that the hashing > procedures would stop hashing all the vectors > at the 4th element and while that's a valuable way to speed up the > hashing process it causes some problems in > the wild.
Hi Lemonboy, Sorry for getting back to this mail so late. > If you use a record with N > 4 slots as a key and then set! one of the > last slots after the 4th one the equal?-hash > procedure would return the same hash where equal? definitely fails. This is not really a problem is it? If the hash is identical, these two objects would simply be put in the same bucket of the hash table, but it wouldn't confuse the two on lookup or insertion AFAICT; that's what the "test" procedure is for. Upping the limit is simply trading one cost (lookup/insertion) for another (hashing). Removing it altogether just removes the upper bound on hashing time, which doesn't seem like a great idea. The limit probably was put in there for a reason. Maybe Felix can enlighten us here? On the other hand, limiting the hash calculation at an arbitrarily low value like we do seems broken, too. However, if we fix this, we should also fix it for vectors. I don't really see why vectors should be treated fundamentally different. > The attached patch fixes this problem by lifting the limit when > hashing tagged vectors aka records, the rationale > behind this decision is that the number of slots a record has is > usually very small and it's more likely to use a > record as a key instead of a huge vector. The record wouldn't typically be very huge, but it *could* be. And the vector could be any size, too and has the same problems if the vector is 5 or 6 slots big with a distinction only on the later slots. Wouldn't it make more sense to just increase this limit to 8, 16 or some other arbitrarily chosen value that's a little better for larger objects? > While inspecting this I also came across another issue, eq?-hash > crunches some plain values by just xor-ing them > with a random seed, while it calls hash-equal? for complex objects > like records/vectors, that's IMO quite a convoluted > way to implement it (and it might also be wrong as eq? performs a > shallow comparison only) that could be replaced by > stirring [2] the raw C_Word value as that's faster (and incidentally > also what guile does [3]). > (Rinse and repeat for the eqv? version) I don't think that would work in general, because the raw C_word can be a pointer which gets changed when the target object is moved around by the GC. But I'm sure you know this. We could do this for immediates, of course. But wouldn't the typical structure of these words (with some key bits set for all values of this type) interfere with this "stirring"? In any case, we can all agree that the srfi-69 module is pretty (almost hopelessly) broken. That's one reason we made it into an egg for CHICKEN 5. Might you be interested in taking over maintainership of this egg? It sounds like you have some interesting ideas, and it makes more sense to iterate quickly on an egg than keep tweaking this code in core, just before what (hopefully) will be the final release of a CHICKEN in the 4 series. What do you think? Cheers, Peter
signature.asc
Description: Digital signature
_______________________________________________ Chicken-hackers mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/chicken-hackers
