[
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grigory Domozhirov updated IGNITE-20610:
----------------------------------------
Description:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0
method.) is clear it seems to work not as expected if allowOverwrite == true
and same keys are added to a DataStreamer.
With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for
the key object (
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
) and is added to GridConcurrentHashSet wrapped in a
DataStreamerImpl.KeyCacheObjectWrapper (
https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L728C12-L728C12
). Since its equals is overridden with identity check it ends up with
`activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s
and thus barely acts is a set.
1) Is that OK in general?
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode,
the more often keys are repeated the lower performance is due to hash
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
IgniteCache<Integer, Long> cache = ignite.createCache("test");
IgniteDataStreamer<Integer, String> dataStreamer =
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
IgniteCache<Integer, Long> cache = ignite.createCache("test");
IgniteDataStreamer<Integer, String> dataStreamer =
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 29025 ms.
was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0
method.) is clear it seems to work not as expected if allowOverwrite == true
and same keys are added to a DataStreamer.
With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for
the key object (
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
) and is added to GridConcurrentHashSet wrapped in a
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with
identity check it ends up with `activeKeys` containing multiple objects with
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set.
1) Is that OK in general?
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode,
the more often keys are repeated the lower performance is due to hash
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
IgniteCache<Integer, Long> cache = ignite.createCache("test");
IgniteDataStreamer<Integer, String> dataStreamer =
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
IgniteCache<Integer, Long> cache = ignite.createCache("test");
IgniteDataStreamer<Integer, String> dataStreamer =
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 29025 ms.
> DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
> --------------------------------------------------------------------------
>
> Key: IGNITE-20610
> URL: https://issues.apache.org/jira/browse/IGNITE-20610
> Project: Ignite
> Issue Type: Task
> Components: streaming
> Affects Versions: 2.15
> Reporter: Grigory Domozhirov
> Priority: Minor
>
> While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data
> streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0
> method.) is clear it seems to work not as expected if allowOverwrite == true
> and same keys are added to a DataStreamer.
> With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created
> for the key object (
> [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
> ) and is added to GridConcurrentHashSet wrapped in a
> DataStreamerImpl.KeyCacheObjectWrapper (
> https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L728C12-L728C12
> ). Since its equals is overridden with identity check it ends up with
> `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s
> and thus barely acts is a set.
>
> 1) Is that OK in general?
> 2) If yes, then does using GridConcurrentHashSet for activeKeys make any
> sense as all its entries are always non-equal?
> 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's
> hashCode, the more often keys are repeated the lower performance is due to
> hash collisions of non-equal objects. Here is a corner case:
> {code:java}
> try (Ignite ignite = Ignition.start(new IgniteConfiguration());
> IgniteCache<Integer, Long> cache = ignite.createCache("test");
> IgniteDataStreamer<Integer, String> dataStreamer =
> ignite.dataStreamer(cache.getName())
> ) {
> dataStreamer.allowOverwrite(true); // doesn't matter
> long start = System.currentTimeMillis();
> for (int i = 0; i < 5_000_000; i++) {
> dataStreamer.addData(i, ""); //unique keys
> }
> System.out.println(System.currentTimeMillis() - start);
> }{code}
> runs in 6029 ms.
> {code:java}
> try (Ignite ignite = Ignition.start(new IgniteConfiguration());
> IgniteCache<Integer, Long> cache = ignite.createCache("test");
> IgniteDataStreamer<Integer, String> dataStreamer =
> ignite.dataStreamer(cache.getName())
> ) {
> dataStreamer.allowOverwrite(true); // doesn't matter
> long start = System.currentTimeMillis();
> for (int i = 0; i < 5_000_000; i++) {
> dataStreamer.addData(0, ""); //equal key
> }
> System.out.println(System.currentTimeMillis() - start);
> }{code}
> runs in 29025 ms.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)