On Mon, Nov 23, 2015 at 4:59 PM, Radim Vansa <[email protected]> wrote: > On 11/23/2015 04:11 PM, Dan Berindei wrote: >> On Mon, Nov 23, 2015 at 2:16 PM, Radim Vansa <[email protected]> wrote: >>> On 11/23/2015 01:07 PM, Sanne Grinovero wrote: >>>> +1 >>>> >>>> See also https://issues.jboss.org/browse/ISPN-3905 ; although I was >>>> mostly concerned on it allocating on a (very) hot path and didn't look >>>> at it in terms of compatibility modes. >>> Yes, due to compatibility we cannot remove the UTF-8 encoding from >>> MurmurHash3 since compatibility with clients (in other languages) >>> depends on this as well, though, we could theoretically merge encoding >>> and hashing into one function - UTF-8 encoder implementation looks quite >>> simple (60 lines of code?) - could be worth it even if used only for >>> server. However, my proposal was to remove that computation from >>> hot-code path completely. >>> >>>> Rather than xorring with "magic numbers" don't you think we >>>> Equivalence implementation should be able to rule on that? >>> We shouldn't require user to provide a pair of hashCode functions, I >>> don't think that would work well in practice. Though, we could make the >>> second function Java 8-default method (with return hashCode() ^ >>> 0xWHATEVER), still allowing it to be overridable. >> The JDK team learned long ago to use a spreader on top of >> user-supplied hashCode() implementations, as user-supplied hash codes >> are usually very clustered. In the case of strings, many times a >> common prefix makes up most of the key, and the hash codes of the keys >> are again clustered. A XOR with a magic value would definitely not >> help with the clustering issue, that's why java.util.HashMap doesn't >> use it. >> >> Note that our consistent hashes map adjacent keys to the same segment: >> we use hash / buckets, whereas HashMap uses hash % buckets. So we >> require a better spread across the hash space than HashMap does, and >> because of that I think we really need MurmurHash3. Still, we could >> change it to work on the result of Equivalence.hashCode(Object), >> instead of dealing with the contents of byte[] and String directly, >> but maintaining compatibility with old clients may not be possible. > > Okay, I don't mind using the int32-spreader, that's just a better > version of XOR with a couple more bitshifts. But do we need two > spreaders, one for CH and another for DC, or is it sufficient that CH > will use division and DC modulo?
I don't know, I've never tried it :) > >> >> Regarding client-server divergences, I think we require compatibility >> mode to be enabled in order to access a cache both via HotRod and with >> the embedded API (because the server casts keys and values to byte[]). >> That means the distribution interceptor sees only the unmarshalled >> key, and getting the same hash code from the marshalled byte[] (on the >> client) and the unmarshalled Object (in the distribution interceptor) >> is going to be quite complex - either with a custom Object.hashCode() >> implementation, or with a custom Equivalence.hash(). I think the only >> way around this would be to change compatibility mode to store keys >> and values as byte[]. > > I was not talking about combined HR + embedded access, but HR-only > access but without the server, therefore configuring & starting cache > manager + HR server on their own. Just curious, what's the use case? > And I've not realized that in remote case the server deals only with > byte[]s, so in non-compatibility mode we would just use > > o instanceof byte[] ? MH3.hash((byte[]) o) : > MH3.hash(equivalence.hashCode(o)) But if we say we don't support non-byte[] keys in a server without compatibility mode enabled (and I believe we do), we don't need the second part at all. > > (there is no point in letting Equivalence compute the hashcode from > byte[] and then just spread it). In compatibility case, we need to keep > the existing behaviour, because they expect us to do the unmarshall -> > encode to UTF-8 -> MH3. But that's already a configuration switch, so no > heuristics are needed. Looks like targetting in compatibility mode is already broken [1], so keeping the existing behaviour may not be necessary after all. [1] https://issues.jboss.org/browse/ISPN-5981 Dan _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
