[ 
https://issues.apache.org/jira/browse/IGNITE-20691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Artiukhov updated IGNITE-20691:
------------------------------------
    Description: 
Ignite 3, rev. c962aa1bbb94e73a4d8ce2403aad3d629dd55666

Ignite 3 fails under [TPC-H benchmark|https://www.tpc.org/tpch/] which sends 
requests via JDBC. The code of the benchmark: 
[https://github.com/cmu-db/benchbase/tree/main/src/main/resources/benchmarks/tpch]
 

Steps:
 * Start an Ignite 3 node with the attached bootstrap config 
[^ignite-config.json]
 * Start a single instance of the benchmark with the following config: 
[^tpch_2023-10-17_08-51-49.config.xml]

*Actual result:*

The node fails with exceptions like the following:
{noformat}
2023-10-17 08:36:48:269 +0300 
[WARNING][%poc-tester-SERVER-192.168.1.80-id-0%metastorage-watch-executor-3][ReplicaManager]
 Failed to process replica request [request=ReadWriteMultiRowReplicaRequestImpl 
[binaryRowMessages=ArrayList [BinaryRowMessageImpl 
[binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=99 cap=99], schemaVersion=1]], 
commitPartitionId=TablePartitionIdMessageImpl [partitionId=20, tableId=26], 
full=false, groupId=26_part_21, requestType=RW_INSERT_ALL, skipDelayedAck=true, 
term=111248573589356783, timestampLong=111248730707264072, 
transactionId=018b3c21-8952-00a1-0000-000091e0d952]]
java.util.concurrent.CompletionException: 
org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException: 
IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica has 
changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0, 
currentLeaseholder=null]
    at 
java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
    at 
java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
    at 
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1074)
    at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at 
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
    at 
org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169)
    at 
java.base/java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122)
    at 
org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169)
    at 
org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103)
    at 
org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146)
    at 
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849)
    at 
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456)
    at 
org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$advanceSafeTime$7(WatchProcessor.java:281)
    at 
java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783)
    at 
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: 
org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException: 
IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica has 
changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0, 
currentLeaseholder=null]
    at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$182(PartitionReplicaListener.java:2666)
    at 
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
    ... 15 more
{noformat}
The benchmark shows 0 completed requests. See 
[^tpch_2023-10-17_08-51-49.summary.json]

[https://gceasy.io/] reports that ~20% of the time the node was in GC pause. 
See [^GCeasy-report-gc-poc-tester-SERVER-192.pdf]

Logs from the node (including GC log): [^node_logs.zip]

  was:
Ignite 3, rev. 

c962aa1bbb94e73a4d8ce2403aad3d629dd55666

 

Ignite 3 fails under [TPC-H benchmark|https://www.tpc.org/tpch/] which sends 
requests via JDBC. The code of the benchmark: 
[https://github.com/cmu-db/benchbase/tree/main/src/main/resources/benchmarks/tpch]
 

 

Steps:
 * Start an Ignite 3 node with the attached bootstrap config 
[^ignite-config.json]
 * Start a single instance of the benchmark with the following config: 
[^tpch_2023-10-17_08-51-49.config.xml]

*Actual result:*

The node fails with exceptions like the following:
{noformat}
2023-10-17 08:36:48:269 +0300 
[WARNING][%poc-tester-SERVER-192.168.1.80-id-0%metastorage-watch-executor-3][ReplicaManager]
 Failed to process replica request [request=ReadWriteMultiRowReplicaRequestImpl 
[binaryRowMessages=ArrayList [BinaryRowMessageImpl 
[binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=99 cap=99], schemaVersion=1]], 
commitPartitionId=TablePartitionIdMessageImpl [partitionId=20, tableId=26], 
full=false, groupId=26_part_21, requestType=RW_INSERT_ALL, skipDelayedAck=true, 
term=111248573589356783, timestampLong=111248730707264072, 
transactionId=018b3c21-8952-00a1-0000-000091e0d952]]
java.util.concurrent.CompletionException: 
org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException: 
IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica has 
changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0, 
currentLeaseholder=null]
    at 
java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
    at 
java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
    at 
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1074)
    at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at 
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
    at 
org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169)
    at 
java.base/java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122)
    at 
org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169)
    at 
org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103)
    at 
org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146)
    at 
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849)
    at 
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456)
    at 
org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$advanceSafeTime$7(WatchProcessor.java:281)
    at 
java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783)
    at 
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: 
org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException: 
IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica has 
changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0, 
currentLeaseholder=null]
    at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$182(PartitionReplicaListener.java:2666)
    at 
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
    ... 15 more
{noformat}
 The benchmark shows 0 completed requests. See 
[^tpch_2023-10-17_08-51-49.summary.json]

[https://gceasy.io/] reports that ~20% of the time the node was in GC pause. 
See [^GCeasy-report-gc-poc-tester-SERVER-192.pdf]

Logs from the node (including GC log): [^node_logs.zip]


> Excessive heap utilization under TPC-H benchmark
> ------------------------------------------------
>
>                 Key: IGNITE-20691
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20691
>             Project: Ignite
>          Issue Type: Bug
>          Components: sql
>            Reporter: Ivan Artiukhov
>            Priority: Major
>              Labels: ignite-3
>         Attachments: GCeasy-report-gc-poc-tester-SERVER-192.pdf, 
> ignite-config.json, node_logs.zip, tpch_2023-10-17_08-51-49.config.xml, 
> tpch_2023-10-17_08-51-49.summary.json
>
>
> Ignite 3, rev. c962aa1bbb94e73a4d8ce2403aad3d629dd55666
> Ignite 3 fails under [TPC-H benchmark|https://www.tpc.org/tpch/] which sends 
> requests via JDBC. The code of the benchmark: 
> [https://github.com/cmu-db/benchbase/tree/main/src/main/resources/benchmarks/tpch]
>  
> Steps:
>  * Start an Ignite 3 node with the attached bootstrap config 
> [^ignite-config.json]
>  * Start a single instance of the benchmark with the following config: 
> [^tpch_2023-10-17_08-51-49.config.xml]
> *Actual result:*
> The node fails with exceptions like the following:
> {noformat}
> 2023-10-17 08:36:48:269 +0300 
> [WARNING][%poc-tester-SERVER-192.168.1.80-id-0%metastorage-watch-executor-3][ReplicaManager]
>  Failed to process replica request 
> [request=ReadWriteMultiRowReplicaRequestImpl [binaryRowMessages=ArrayList 
> [BinaryRowMessageImpl [binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=99 
> cap=99], schemaVersion=1]], commitPartitionId=TablePartitionIdMessageImpl 
> [partitionId=20, tableId=26], full=false, groupId=26_part_21, 
> requestType=RW_INSERT_ALL, skipDelayedAck=true, term=111248573589356783, 
> timestampLong=111248730707264072, 
> transactionId=018b3c21-8952-00a1-0000-000091e0d952]]
> java.util.concurrent.CompletionException: 
> org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException: 
> IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica 
> has changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0, 
> currentLeaseholder=null]
>     at 
> java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
>     at 
> java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
>     at 
> java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1074)
>     at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>     at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
>     at 
> org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169)
>     at 
> java.base/java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122)
>     at 
> org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169)
>     at 
> org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103)
>     at 
> org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146)
>     at 
> org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849)
>     at 
> org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456)
>     at 
> org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$advanceSafeTime$7(WatchProcessor.java:281)
>     at 
> java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783)
>     at 
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: 
> org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException: 
> IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica 
> has changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0, 
> currentLeaseholder=null]
>     at 
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$182(PartitionReplicaListener.java:2666)
>     at 
> java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
>     ... 15 more
> {noformat}
> The benchmark shows 0 completed requests. See 
> [^tpch_2023-10-17_08-51-49.summary.json]
> [https://gceasy.io/] reports that ~20% of the time the node was in GC pause. 
> See [^GCeasy-report-gc-poc-tester-SERVER-192.pdf]
> Logs from the node (including GC log): [^node_logs.zip]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to