[ https://issues.apache.org/jira/browse/IGNITE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Griggs reassigned IGNITE-4828: -------------------------------------- Assignee: Michael Griggs (was: Michael Griggs) > Improve the distribution of keys within partitions > -------------------------------------------------- > > Key: IGNITE-4828 > URL: https://issues.apache.org/jira/browse/IGNITE-4828 > Project: Ignite > Issue Type: Improvement > Affects Versions: 1.9 > Reporter: Michael Griggs > Assignee: Michael Griggs > Fix For: 2.0 > > > An issue has been found when inserting several million string keys in to a > cache. Each string key was approximately 22-characters in length. When I > exported the partition counts (via GG Visor) I was able to see an unusual > periodicity in the number of keys allocated to partitions. I charted this in > Excel (1). > After further investigation, it appears that there is a relationship > between the number of keys being inserted, the number of partitions > assigned to the cache and amount of apparent periodicity: a small number > ofpartitions will cause periodicity to appear with a lower number of keys. > The {{RendezvousAffinityFunction#partition}} function performs a simple > calculation of key hashcode modulo partition-count: > {{U.safeAbs(key.hashCode() % parts)}} > Digging further I was led to the fact that this is how the Java HashMap > *used* to behave (2), but was upgraded around Java 1.4 to perform the > following: > {{key.hashCode() & (parts - 1)}} > which performs more efficiently. It was then updated further to do the > following: > {{(h = key.hashCode()) ^ (h >>> 16);}} > with the bit-shift performed to > bq. incorporate impact of the highest bits that would otherwise never be used > in index calculations because of table bounds > When using this function, rather than our > {{RendezvousAffinityFunction#partition}} implementation, I also saw a > significant decrease in the periodicity and a better distribution of keys > amongst partitions (3). > (1): https://i.imgur.com/0FtCZ2A.png > (2): > https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings > (3): https://i.imgur.com/8ZuCSA3.png -- This message was sent by Atlassian JIRA (v6.3.15#6346)