I am quite happy to read a discussion about hashing in Guile as this
hash been one of the few real pain points in my experience: the hash is
not behaving as one would expect from a generic hash function. What
follows is not directly related to the question asked, so I hope this
message is not misplaced.
> Two different lists, same hash values. Weird. But it's true:
>
> scheme@(guile-user)> (= (hash '(1 2 3 (4)) most-positive-fixnum)
> (hash '(1 2 3 (4 5)) most-positive-fixnum))
> => #t
>
> and it gets worse than that:
>
> scheme@(guile-user)> (= (hash '(1 2 3 4 5) most-positive-fixnum)
> (hash '(1 2 3 4 5 6) most-positive-fixnum))
> => #t
As noted before, there is the issue with the recursion being limited to
a rather low level, but hashing is also invariant across permutation of
its inputs:
```
scheme@(guile-user)> (= (hash '(1 2) most-positive-fixnum)
(hash '(2 1) most-positive-fixnum))
$7 = #t
```
And for some data types, it is simply broken:
```
scheme@(guile-user)> (= (hash (make-u8vector 3 0) most-positive-fixnum)
(hash (make-u8vector 3 1) most-positive-fixnum))
$6 = #t
```
The same holds for `f32vector` and `f64vector` which are implemented on
top of `u8vector`. Those two specific cases caused very-hard-to-debug
issues in two of my projects...
> One of the design goals for ‘hash’ is that it must be constant-time.
> Thus, by definition, it cannot traverse entire lists or vectors.
I don't know what is the story behind this function. I don't know why it
is required to be constant-time. What I do know is that nowadays,
non-cryptographic hash functions are so fast that hashing is rarely an
issue [1], and that many languages have converged to using SipHash for
hash-tables since it is fast and protects against hash flooding (if
hash-table based on a poorly designed hash is exposed through a public
API, an attacker may be able to forge malicious inputs that all hash to
the same target bucket).
> I think what we probably want is a strong collision resistant hash. A
> strong crypto hash would do the job for this. So the current core hash
> would not be impacted.
Cryptographic hash functions have properties that are not required for
generic hash functions (like pre-image resistance) which makes them much
slower. SipHash is formally a MAC since it requires a key to be secure,
but it is really fast (see [1]).
I do not think going for a cryptographic hash function as default hash
is a good idea. I would use SipHash for hash-tables and maybe another
one like CityHash or XXH for internal use like generating identifiers.
> I had some discussions related to this, i.e. crypto hash, in core guile,
> with Ludovic and Rob. I think that having core support for some
> the crypto hashing makes sense, especially if the compiler needs it.
That said, I think having curated cryptographic primitives in the core
language like Go does is a great step toward a more secure ecosystem.
Even if the compiler does not need it in the end.
[1] https://xxhash.com/