[
https://issues.apache.org/jira/browse/FLINK-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120452#comment-17120452
]
Lisheng Sun edited comment on FLINK-12693 at 5/31/20, 11:53 AM:
----------------------------------------------------------------
hi [~banmoy]
According to test's result, the performance of calculation hash in
CopyOnWriteStateMap is much worse than JDK HashMap.
flink-benchmark 50w data:
|| ||valueGet||
|Native CopyOnWriteStateMap|*4209.536* ± 336.956 ops/ms |
|Update hash method CopyOnWriteStateMap|*4851.071* ± 158.335 ops/ms|
Could you tell what the new hash algorithm is for. Reduce hash collision?
Thank you.
CopyOnWriteStateMap#computeHashForOperationAndDoIncrementalRehash#compositeHash#bitMix
{code:java}
public static int bitMix(int in) {
in ^= in >>> 16;
in *= 0x85ebca6b;
in ^= in >>> 13;
in *= 0xc2b2ae35;
in ^= in >>> 16;
return in;
}
{code}
HashMap#hash
{code:java}
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}{code}
was (Author: leosun08):
hi [~banmoy]
According to test's result, the performance of calculation hash in
CopyOnWriteStateMap is much worse than JDK HashMap.
|| ||读性能指标||
|Native CopyOnWriteStateMap|*4209.536* ± 336.956 ops/ms |
|Update hash method CopyOnWriteStateMap|*4851.071* ± 158.335 ops/ms|
Could you tell what the new hash algorithm is for. Reduce hash collision?
Thank you.
CopyOnWriteStateMap#computeHashForOperationAndDoIncrementalRehash#compositeHash#bitMix
{code:java}
public static int bitMix(int in) {
in ^= in >>> 16;
in *= 0x85ebca6b;
in ^= in >>> 13;
in *= 0xc2b2ae35;
in ^= in >>> 16;
return in;
}
{code}
HashMap#hash
{code:java}
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}{code}
> Store state per key-group in CopyOnWriteStateTable
> --------------------------------------------------
>
> Key: FLINK-12693
> URL: https://issues.apache.org/jira/browse/FLINK-12693
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / State Backends
> Reporter: Yu Li
> Assignee: PengFei Li
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Since we propose to use KeyGroup as the unit of spilling/loading, the first
> step is to store state per key-groups. Currently {{NestedMapsStateTable}}
> natively supports this, so we only need to refine {{CopyOnWriteStateTable}}
> The main efforts required here is to extract the customized hash-map out of
> {{CopyOnWriteStateTable}} then use such a hash-map as the state holder for
> each KeyGroup. Whereafter we could extract some common logic out into
> {{StateTable}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)