On 11/23/2015 04:11 PM, Dan Berindei wrote: > On Mon, Nov 23, 2015 at 2:16 PM, Radim Vansa <[email protected]> wrote: >> On 11/23/2015 01:07 PM, Sanne Grinovero wrote: >>> +1 >>> >>> See also https://issues.jboss.org/browse/ISPN-3905 ; although I was >>> mostly concerned on it allocating on a (very) hot path and didn't look >>> at it in terms of compatibility modes. >> Yes, due to compatibility we cannot remove the UTF-8 encoding from >> MurmurHash3 since compatibility with clients (in other languages) >> depends on this as well, though, we could theoretically merge encoding >> and hashing into one function - UTF-8 encoder implementation looks quite >> simple (60 lines of code?) - could be worth it even if used only for >> server. However, my proposal was to remove that computation from >> hot-code path completely. >> >>> Rather than xorring with "magic numbers" don't you think we >>> Equivalence implementation should be able to rule on that? >> We shouldn't require user to provide a pair of hashCode functions, I >> don't think that would work well in practice. Though, we could make the >> second function Java 8-default method (with return hashCode() ^ >> 0xWHATEVER), still allowing it to be overridable. > The JDK team learned long ago to use a spreader on top of > user-supplied hashCode() implementations, as user-supplied hash codes > are usually very clustered. In the case of strings, many times a > common prefix makes up most of the key, and the hash codes of the keys > are again clustered. A XOR with a magic value would definitely not > help with the clustering issue, that's why java.util.HashMap doesn't > use it. > > Note that our consistent hashes map adjacent keys to the same segment: > we use hash / buckets, whereas HashMap uses hash % buckets. So we > require a better spread across the hash space than HashMap does, and > because of that I think we really need MurmurHash3. Still, we could > change it to work on the result of Equivalence.hashCode(Object), > instead of dealing with the contents of byte[] and String directly, > but maintaining compatibility with old clients may not be possible.
Okay, I don't mind using the int32-spreader, that's just a better version of XOR with a couple more bitshifts. But do we need two spreaders, one for CH and another for DC, or is it sufficient that CH will use division and DC modulo? > > Regarding client-server divergences, I think we require compatibility > mode to be enabled in order to access a cache both via HotRod and with > the embedded API (because the server casts keys and values to byte[]). > That means the distribution interceptor sees only the unmarshalled > key, and getting the same hash code from the marshalled byte[] (on the > client) and the unmarshalled Object (in the distribution interceptor) > is going to be quite complex - either with a custom Object.hashCode() > implementation, or with a custom Equivalence.hash(). I think the only > way around this would be to change compatibility mode to store keys > and values as byte[]. I was not talking about combined HR + embedded access, but HR-only access but without the server, therefore configuring & starting cache manager + HR server on their own. And I've not realized that in remote case the server deals only with byte[]s, so in non-compatibility mode we would just use o instanceof byte[] ? MH3.hash((byte[]) o) : MH3.hash(equivalence.hashCode(o)) (there is no point in letting Equivalence compute the hashcode from byte[] and then just spread it). In compatibility case, we need to keep the existing behaviour, because they expect us to do the unmarshall -> encode to UTF-8 -> MH3. But that's already a configuration switch, so no heuristics are needed. Radim > > Cheers > Dan > >> Radim >> >>> On 23 November 2015 at 10:26, Radim Vansa <[email protected]> wrote: >>>> Hi guys, >>>> >>>> I have noticed that even in library mode we use MurmurHash3 to find out >>>> the segment for particular key. For strings, this involves encoding into >>>> UTF-8 and computation of hashCode, instead of just reading the cached >>>> value in string. Common objects just remix the bits of hashCode. When >>>> user provides custom Equivalence with non-default hashCode, it is not >>>> used to determine the segment. >>>> >>>> I think that in library mode we should rather use Equivalence.hashCode, >>>> maybe XORed with some magic number so that there are less collisions in >>>> DataContainer. >>>> >>>> If we simply replaced the function in CH, we would break the case when >>>> user starts HR server on top of library mode, as the clients expect key >>>> location based on MurmurHash3. ATM user only has to set >>>> AnyServerEquivalence for keys in DC; we would need to detect >>>> configuration with server equivalence and set CH function to MH3, and >>>> probably also log some warning if the equivalence is set to unknown >>>> class and CH function is not specified. >>>> >>>> WDYT? >>>> >>>> Radim >>>> >>>> -- >>>> Radim Vansa <[email protected]> >>>> JBoss Performance Team >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> [email protected] >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> [email protected] >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> -- >> Radim Vansa <[email protected]> >> JBoss Performance Team >> >> _______________________________________________ >> infinispan-dev mailing list >> [email protected] >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa <[email protected]> JBoss Performance Team _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
