leerho commented on issue #34:
URL: 
https://github.com/apache/datasketches-rust/issues/34#issuecomment-3678303224

   I started this hash function journey back in 2011-2012 when I needed a fast 
open-source hash function for my very early versions of the Theta sketch.  And 
the MurmurHash3 fit the bill at the time.  I chose the [128 bit 
version](https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp#L255)
 as I foresaw the possible need for more than 64 bits.  Alas, if you require 
backward compatibility, once chosen, you are pretty much stuck with it.  And we 
have used the same hash function for a number of the sketches.  
   
   I have refactored Austin Applebee's code several times:
   
   - To accommodate different java primitive arrays (C++ doesn't have this 
issue) 
   - To accommodate common containers like ByteBuffer and more recently 
MemorySegments.
   - For a while I had a version using Unsafe for added speed
   - And now I have a version that leverages FFM.
   
   They all produce the same bits.  ( I have tested them thoroughly at several 
different times and have not found problems.)
   
   The only difference between my implementations and Austin's is that he 
restricted the size of the seed to a 32bit unsigned integer.  Java doesn't have 
unsigned ints, so I allow java's signed long as a seed.  So as long as you use 
seeds less than 1L << 32 and > 0, you should be ok. 
   
   Back then, since sketching was still so new and not well understood, (hmm, 
has that changed?) I wanted the hash function to be fixed and not a choice for 
the user.  The main reason for this is that once a sketch is created and 
serialized, who knows how long it will be stored for?  (At one time Yahoo had 
over 10 years of stored sketches.)  Changing the hash function (or its seed) if 
done carelessly could destroy the ability to ever read those old stored 
sketches (since the original data is long gone!).
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to