On Mon, Nov 23, 2015 at 2:16 PM, Radim Vansa <[email protected]> wrote: > On 11/23/2015 01:07 PM, Sanne Grinovero wrote: >> +1 >> >> See also https://issues.jboss.org/browse/ISPN-3905 ; although I was >> mostly concerned on it allocating on a (very) hot path and didn't look >> at it in terms of compatibility modes. > Yes, due to compatibility we cannot remove the UTF-8 encoding from > MurmurHash3 since compatibility with clients (in other languages) > depends on this as well, though, we could theoretically merge encoding > and hashing into one function - UTF-8 encoder implementation looks quite > simple (60 lines of code?) - could be worth it even if used only for > server. However, my proposal was to remove that computation from > hot-code path completely. > >> >> Rather than xorring with "magic numbers" don't you think we >> Equivalence implementation should be able to rule on that? > > We shouldn't require user to provide a pair of hashCode functions, I > don't think that would work well in practice. Though, we could make the > second function Java 8-default method (with return hashCode() ^ > 0xWHATEVER), still allowing it to be overridable.
The JDK team learned long ago to use a spreader on top of user-supplied hashCode() implementations, as user-supplied hash codes are usually very clustered. In the case of strings, many times a common prefix makes up most of the key, and the hash codes of the keys are again clustered. A XOR with a magic value would definitely not help with the clustering issue, that's why java.util.HashMap doesn't use it. Note that our consistent hashes map adjacent keys to the same segment: we use hash / buckets, whereas HashMap uses hash % buckets. So we require a better spread across the hash space than HashMap does, and because of that I think we really need MurmurHash3. Still, we could change it to work on the result of Equivalence.hashCode(Object), instead of dealing with the contents of byte[] and String directly, but maintaining compatibility with old clients may not be possible. Regarding client-server divergences, I think we require compatibility mode to be enabled in order to access a cache both via HotRod and with the embedded API (because the server casts keys and values to byte[]). That means the distribution interceptor sees only the unmarshalled key, and getting the same hash code from the marshalled byte[] (on the client) and the unmarshalled Object (in the distribution interceptor) is going to be quite complex - either with a custom Object.hashCode() implementation, or with a custom Equivalence.hash(). I think the only way around this would be to change compatibility mode to store keys and values as byte[]. Cheers Dan > > Radim > >> >> On 23 November 2015 at 10:26, Radim Vansa <[email protected]> wrote: >>> Hi guys, >>> >>> I have noticed that even in library mode we use MurmurHash3 to find out >>> the segment for particular key. For strings, this involves encoding into >>> UTF-8 and computation of hashCode, instead of just reading the cached >>> value in string. Common objects just remix the bits of hashCode. When >>> user provides custom Equivalence with non-default hashCode, it is not >>> used to determine the segment. >>> >>> I think that in library mode we should rather use Equivalence.hashCode, >>> maybe XORed with some magic number so that there are less collisions in >>> DataContainer. >>> >>> If we simply replaced the function in CH, we would break the case when >>> user starts HR server on top of library mode, as the clients expect key >>> location based on MurmurHash3. ATM user only has to set >>> AnyServerEquivalence for keys in DC; we would need to detect >>> configuration with server equivalence and set CH function to MH3, and >>> probably also log some warning if the equivalence is set to unknown >>> class and CH function is not specified. >>> >>> WDYT? >>> >>> Radim >>> >>> -- >>> Radim Vansa <[email protected]> >>> JBoss Performance Team >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> [email protected] >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> [email protected] >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa <[email protected]> > JBoss Performance Team > > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
