[
https://issues.apache.org/jira/browse/IGNITE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Griggs updated IGNITE-4828:
-----------------------------------
Description:
An issue has been found when inserting several million string keys in to a
cache. Each string key was approximately 22-characters in length. When I
exported the partition counts (via GG Visor) I was able to see an unusual
periodicity in the number of keys allocated to partitions. I charted this in
Excel (1).
After further investigation, it appears that there is a relationship
between the number of keys being inserted, the number of partitions
assigned to the cache and amount of apparent periodicity: a small number of
partitions will cause periodicity to appear with a lower number of keys.
The {{RendezvousAffinityFunction#partition}} function performs a simple
calculation of key hashcode modulo partition-count:
{{U.safeAbs(key.hashCode() % parts)}}
Digging further I was led to the fact that this is how the Java HashMap
*used* to behave (2), but was upgraded around Java 1.4 to perform the
following:
{{key.hashCode() & (parts - 1)}}
which performs more efficiently. It was then updated further to do the
following:
{{((h = key.hashCode()) ^ (h >>> 16) & (parts - 1));}}
with the bit-shift performed to
bq. incorporate impact of the highest bits that would otherwise never be used
in index calculations because of table bounds
When using this function, rather than our
{{RendezvousAffinityFunction#partition}} implementation, I also saw a
significant decrease in the periodicity and a better distribution of keys
amongst partitions (3).
(1): https://i.imgur.com/0FtCZ2A.png
(2):
https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings
(3): https://i.imgur.com/8ZuCSA3.png
was:
An issue has been found when inserting several million string keys in to a
cache. Each string key was approximately 22-characters in length. When I
exported the partition counts (via GG Visor) I was able to see an unusual
periodicity in the number of keys allocated to partitions. I charted this in
Excel (1).
After further investigation, it appears that there is a relationship
between the number of keys being inserted, the number of partitions
assigned to the cache and amount of apparent periodicity: a small number
ofpartitions will cause periodicity to appear with a lower number of keys.
The {{RendezvousAffinityFunction#partition}} function performs a simple
calculation of key hashcode modulo partition-count:
{{U.safeAbs(key.hashCode() % parts)}}
Digging further I was led to the fact that this is how the Java HashMap
*used* to behave (2), but was upgraded around Java 1.4 to perform the
following:
{{key.hashCode() & (parts - 1)}}
which performs more efficiently. It was then updated further to do the
following:
{{(h = key.hashCode()) ^ (h >>> 16);}}
with the bit-shift performed to
bq. incorporate impact of the highest bits that would otherwise never be used
in index calculations because of table bounds
When using this function, rather than our
{{RendezvousAffinityFunction#partition}} implementation, I also saw a
significant decrease in the periodicity and a better distribution of keys
amongst partitions (3).
(1): https://i.imgur.com/0FtCZ2A.png
(2):
https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings
(3): https://i.imgur.com/8ZuCSA3.png
> Improve the distribution of keys within partitions
> --------------------------------------------------
>
> Key: IGNITE-4828
> URL: https://issues.apache.org/jira/browse/IGNITE-4828
> Project: Ignite
> Issue Type: Improvement
> Affects Versions: 1.9
> Reporter: Michael Griggs
> Assignee: Michael Griggs
> Fix For: 2.0
>
>
> An issue has been found when inserting several million string keys in to a
> cache. Each string key was approximately 22-characters in length. When I
> exported the partition counts (via GG Visor) I was able to see an unusual
> periodicity in the number of keys allocated to partitions. I charted this in
> Excel (1).
> After further investigation, it appears that there is a relationship
> between the number of keys being inserted, the number of partitions
> assigned to the cache and amount of apparent periodicity: a small number of
> partitions will cause periodicity to appear with a lower number of keys.
> The {{RendezvousAffinityFunction#partition}} function performs a simple
> calculation of key hashcode modulo partition-count:
> {{U.safeAbs(key.hashCode() % parts)}}
> Digging further I was led to the fact that this is how the Java HashMap
> *used* to behave (2), but was upgraded around Java 1.4 to perform the
> following:
> {{key.hashCode() & (parts - 1)}}
> which performs more efficiently. It was then updated further to do the
> following:
> {{((h = key.hashCode()) ^ (h >>> 16) & (parts - 1));}}
> with the bit-shift performed to
> bq. incorporate impact of the highest bits that would otherwise never be used
> in index calculations because of table bounds
> When using this function, rather than our
> {{RendezvousAffinityFunction#partition}} implementation, I also saw a
> significant decrease in the periodicity and a better distribution of keys
> amongst partitions (3).
> (1): https://i.imgur.com/0FtCZ2A.png
> (2):
> https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings
> (3): https://i.imgur.com/8ZuCSA3.png
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)