leerho commented on issue #34: URL: https://github.com/apache/datasketches-rust/issues/34#issuecomment-3678303224
I started this hash function journey back in 2011-2012 when I needed a fast open-source hash function for my very early versions of the Theta sketch. And the MurmurHash3 fit the bill at the time. I chose the [128 bit version](https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp#L255) as I foresaw the possible need for more than 64 bits. Alas, if you require backward compatibility, once chosen, you are pretty much stuck with it. And we have used the same hash function for a number of the sketches. I have refactored Austin Applebee's code several times: - To accommodate different java primitive arrays (C++ doesn't have this issue) - To accommodate common containers like ByteBuffer and more recently MemorySegments. - For a while I had a version using Unsafe for added speed - And now I have a version that leverages FFM. They all produce the same bits. ( I have tested them thoroughly at several different times and have not found problems.) The only difference between my implementations and Austin's is that he restricted the size of the seed to a 32bit unsigned integer. Java doesn't have unsigned ints, so I allow java's signed long as a seed. So as long as you use seeds less than 1L << 32 and > 0, you should be ok. Back then, since sketching was still so new and not well understood, (hmm, has that changed?) I wanted the hash function to be fixed and not a choice for the user. The main reason for this is that once a sketch is created and serialized, who knows how long it will be stored for? (At one time Yahoo had over 10 years of stored sketches.) Changing the hash function (or its seed) if done carelessly could destroy the ability to ever read those old stored sketches (since the original data is long gone!). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
