[
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grigory Domozhirov updated IGNITE-20610:
----------------------------------------
Description:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0
method.) is clear it seems to work not as expected if {code:java}allowOverwrite
== true{code} and same keys are added to `DataStreamer`.
With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created
for the key object (
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
) and is added to `GridConcurrentHashSet` wrapped in a
`DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with
identity check it ends up with `activeKeys` containing multiple objects with
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set.
1) Is that OK in general?
2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode,
the more often keys are repeated the lower is performance due to hash
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache<Integer, Long> cache = ignite.createCache("test");
IgniteDataStreamer<Integer, String> dataStreamer =
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache<Integer, Long> cache = ignite.createCache("test");
IgniteDataStreamer<Integer, String> dataStreamer =
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736
was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0
method.) is clear it seems to work not as expected if allowOverwrite == true
and same keys are added to `DataStreamer`.
With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created
for the key object (
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
) and is added to `GridConcurrentHashSet` wrapped in a
`DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with
identity check it ends up with `activeKeys` containing multiple objects with
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set.
1) Is that OK in general?
2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode,
the more often keys are repeated the lower is performance due to hash
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache<Integer, Long> cache = ignite.createCache("test");
IgniteDataStreamer<Integer, String> dataStreamer =
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache<Integer, Long> cache = ignite.createCache("test");
IgniteDataStreamer<Integer, String> dataStreamer =
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736
> DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
> --------------------------------------------------------------------------
>
> Key: IGNITE-20610
> URL: https://issues.apache.org/jira/browse/IGNITE-20610
> Project: Ignite
> Issue Type: Task
> Components: streaming
> Affects Versions: 2.15
> Reporter: Grigory Domozhirov
> Priority: Minor
>
> While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data
> streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0
> method.) is clear it seems to work not as expected if
> {code:java}allowOverwrite == true{code} and same keys are added to
> `DataStreamer`.
> With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is
> created for the key object (
> [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
> ) and is added to `GridConcurrentHashSet` wrapped in a
> `DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with
> identity check it ends up with `activeKeys` containing multiple objects with
> equal `UserKeyCacheObjectImpl`s and thus barely acts is a set.
>
> 1) Is that OK in general?
> 2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's
> hashCode, the more often keys are repeated the lower is performance due to
> hash collisions of non-equal objects. Here is an example:
> {code:java}
> try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
> try (IgniteCache<Integer, Long> cache = ignite.createCache("test");
> IgniteDataStreamer<Integer, String> dataStreamer =
> ignite.dataStreamer(cache.getName())
> ) {
> dataStreamer.allowOverwrite(true); // doesn't matter
> long start = System.currentTimeMillis();
> for (int i = 0; i < 2_000_000; i++) {
> dataStreamer.addData(i, ""); //unique keys
> }
> long elapsed = System.currentTimeMillis() - start;
> System.out.println(elapsed);
> }
> } {code}
> runs in 3970 ms.
> {code:java}
> try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
> try (IgniteCache<Integer, Long> cache = ignite.createCache("test");
> IgniteDataStreamer<Integer, String> dataStreamer =
> ignite.dataStreamer(cache.getName())
> ) {
> dataStreamer.allowOverwrite(true); // doesn't matter
> long start = System.currentTimeMillis();
> for (int i = 0; i < 2_000_000; i++) {
> dataStreamer.addData(0, ""); //equal key
> }
> long elapsed = System.currentTimeMillis() - start;
> System.out.println(elapsed);
> }
> } {code}
> runs in 12736
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)