[
https://issues.apache.org/jira/browse/IGNITE-20691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Artiukhov updated IGNITE-20691:
------------------------------------
Description:
Ignite 3, rev. c962aa1bbb94e73a4d8ce2403aad3d629dd55666
Ignite 3 fails under [TPC-H benchmark|https://www.tpc.org/tpch/] which sends
requests via JDBC. The code of the benchmark:
[https://github.com/cmu-db/benchbase/tree/main/src/main/resources/benchmarks/tpch]
Steps:
* Start an Ignite 3 node with the attached bootstrap config
[^ignite-config.json]
* Start a single instance of the benchmark with the following config:
[^tpch_2023-10-17_08-51-49.config.xml]
*Actual result:*
The node fails with exceptions like the following:
{noformat}
2023-10-17 08:36:48:269 +0300
[WARNING][%poc-tester-SERVER-192.168.1.80-id-0%metastorage-watch-executor-3][ReplicaManager]
Failed to process replica request [request=ReadWriteMultiRowReplicaRequestImpl
[binaryRowMessages=ArrayList [BinaryRowMessageImpl
[binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=99 cap=99], schemaVersion=1]],
commitPartitionId=TablePartitionIdMessageImpl [partitionId=20, tableId=26],
full=false, groupId=26_part_21, requestType=RW_INSERT_ALL, skipDelayedAck=true,
term=111248573589356783, timestampLong=111248730707264072,
transactionId=018b3c21-8952-00a1-0000-000091e0d952]]
java.util.concurrent.CompletionException:
org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException:
IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica has
changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0,
currentLeaseholder=null]
at
java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
at
java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
at
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1074)
at
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
at
org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169)
at
java.base/java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122)
at
org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169)
at
org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103)
at
org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146)
at
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849)
at
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456)
at
org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$advanceSafeTime$7(WatchProcessor.java:281)
at
java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783)
at
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by:
org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException:
IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica has
changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0,
currentLeaseholder=null]
at
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$182(PartitionReplicaListener.java:2666)
at
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
... 15 more
{noformat}
The benchmark shows 0 completed requests. See
[^tpch_2023-10-17_08-51-49.summary.json]
[https://gceasy.io/] reports that ~20% of the time the node was in GC pause.
See [^GCeasy-report-gc-poc-tester-SERVER-192.pdf]
Logs from the node (including GC log): [^node_logs.zip]
was:
Ignite 3, rev.
c962aa1bbb94e73a4d8ce2403aad3d629dd55666
Ignite 3 fails under [TPC-H benchmark|https://www.tpc.org/tpch/] which sends
requests via JDBC. The code of the benchmark:
[https://github.com/cmu-db/benchbase/tree/main/src/main/resources/benchmarks/tpch]
Steps:
* Start an Ignite 3 node with the attached bootstrap config
[^ignite-config.json]
* Start a single instance of the benchmark with the following config:
[^tpch_2023-10-17_08-51-49.config.xml]
*Actual result:*
The node fails with exceptions like the following:
{noformat}
2023-10-17 08:36:48:269 +0300
[WARNING][%poc-tester-SERVER-192.168.1.80-id-0%metastorage-watch-executor-3][ReplicaManager]
Failed to process replica request [request=ReadWriteMultiRowReplicaRequestImpl
[binaryRowMessages=ArrayList [BinaryRowMessageImpl
[binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=99 cap=99], schemaVersion=1]],
commitPartitionId=TablePartitionIdMessageImpl [partitionId=20, tableId=26],
full=false, groupId=26_part_21, requestType=RW_INSERT_ALL, skipDelayedAck=true,
term=111248573589356783, timestampLong=111248730707264072,
transactionId=018b3c21-8952-00a1-0000-000091e0d952]]
java.util.concurrent.CompletionException:
org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException:
IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica has
changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0,
currentLeaseholder=null]
at
java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
at
java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
at
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1074)
at
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
at
org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169)
at
java.base/java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122)
at
org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169)
at
org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103)
at
org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146)
at
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849)
at
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456)
at
org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$advanceSafeTime$7(WatchProcessor.java:281)
at
java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783)
at
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by:
org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException:
IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica has
changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0,
currentLeaseholder=null]
at
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$182(PartitionReplicaListener.java:2666)
at
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
... 15 more
{noformat}
The benchmark shows 0 completed requests. See
[^tpch_2023-10-17_08-51-49.summary.json]
[https://gceasy.io/] reports that ~20% of the time the node was in GC pause.
See [^GCeasy-report-gc-poc-tester-SERVER-192.pdf]
Logs from the node (including GC log): [^node_logs.zip]
> Excessive heap utilization under TPC-H benchmark
> ------------------------------------------------
>
> Key: IGNITE-20691
> URL: https://issues.apache.org/jira/browse/IGNITE-20691
> Project: Ignite
> Issue Type: Bug
> Components: sql
> Reporter: Ivan Artiukhov
> Priority: Major
> Labels: ignite-3
> Attachments: GCeasy-report-gc-poc-tester-SERVER-192.pdf,
> ignite-config.json, node_logs.zip, tpch_2023-10-17_08-51-49.config.xml,
> tpch_2023-10-17_08-51-49.summary.json
>
>
> Ignite 3, rev. c962aa1bbb94e73a4d8ce2403aad3d629dd55666
> Ignite 3 fails under [TPC-H benchmark|https://www.tpc.org/tpch/] which sends
> requests via JDBC. The code of the benchmark:
> [https://github.com/cmu-db/benchbase/tree/main/src/main/resources/benchmarks/tpch]
>
> Steps:
> * Start an Ignite 3 node with the attached bootstrap config
> [^ignite-config.json]
> * Start a single instance of the benchmark with the following config:
> [^tpch_2023-10-17_08-51-49.config.xml]
> *Actual result:*
> The node fails with exceptions like the following:
> {noformat}
> 2023-10-17 08:36:48:269 +0300
> [WARNING][%poc-tester-SERVER-192.168.1.80-id-0%metastorage-watch-executor-3][ReplicaManager]
> Failed to process replica request
> [request=ReadWriteMultiRowReplicaRequestImpl [binaryRowMessages=ArrayList
> [BinaryRowMessageImpl [binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=99
> cap=99], schemaVersion=1]], commitPartitionId=TablePartitionIdMessageImpl
> [partitionId=20, tableId=26], full=false, groupId=26_part_21,
> requestType=RW_INSERT_ALL, skipDelayedAck=true, term=111248573589356783,
> timestampLong=111248730707264072,
> transactionId=018b3c21-8952-00a1-0000-000091e0d952]]
> java.util.concurrent.CompletionException:
> org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException:
> IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica
> has changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0,
> currentLeaseholder=null]
> at
> java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
> at
> java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
> at
> java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1074)
> at
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
> at
> org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169)
> at
> java.base/java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122)
> at
> org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169)
> at
> org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103)
> at
> org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146)
> at
> org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849)
> at
> org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456)
> at
> org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$advanceSafeTime$7(WatchProcessor.java:281)
> at
> java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783)
> at
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by:
> org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException:
> IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica
> has changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0,
> currentLeaseholder=null]
> at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$182(PartitionReplicaListener.java:2666)
> at
> java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
> ... 15 more
> {noformat}
> The benchmark shows 0 completed requests. See
> [^tpch_2023-10-17_08-51-49.summary.json]
> [https://gceasy.io/] reports that ~20% of the time the node was in GC pause.
> See [^GCeasy-report-gc-poc-tester-SERVER-192.pdf]
> Logs from the node (including GC log): [^node_logs.zip]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)