Grigory Domozhirov created IGNITE-20610:
-------------------------------------------
Summary: DataStreamerImpl.KeyCacheObjectWrapper low performance
for non-unique keys
Key: IGNITE-20610
URL: https://issues.apache.org/jira/browse/IGNITE-20610
Project: Ignite
Issue Type: Task
Components: streaming
Affects Versions: 2.15
Reporter: Grigory Domozhirov
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0
method.) is clear it seems to work not as expected if `allowOverwrite == true`
and same keys are added to `DataStreamer`.
With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created
for the key object (
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
) and is added to `GridConcurrentHashSet` wrapped in a
`DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with
identity check it ends up with `activeKeys` containing multiple objects with
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set.
1) Is that OK in general?
2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode,
the more often keys are repeated the lower is performance due to hash
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache<Integer, Long> cache = ignite.createCache("test");
IgniteDataStreamer<Integer, String> dataStreamer =
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache<Integer, Long> cache = ignite.createCache("test");
IgniteDataStreamer<Integer, String> dataStreamer =
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736
--
This message was sent by Atlassian Jira
(v8.20.10#820010)