[
https://issues.apache.org/jira/browse/IGNITE-7570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538423#comment-16538423
]
Oscar Torreno commented on IGNITE-7570:
---------------------------------------
We are facing a similar situation.
Our setup:
JMeter JUnit tests
- 6 server nodes running inside kubernetes, 4 client nodes at 2 JMeter slaves
- 1 cache: partitioned, atomic, 2 backups, read from backup set to true
- keys: always the same (10 and 12 in this test)
- operations: PUT GET
- 1 of the 6 servers (kubernetes pods) is being restarted every 2 minutes
The baseline topology is being updated correctly every node restart, having 6
online servers after the restart takes place.
> Client nodes failed with "Failed to process invalid partitions response"
> during failover
> ----------------------------------------------------------------------------------------
>
> Key: IGNITE-7570
> URL: https://issues.apache.org/jira/browse/IGNITE-7570
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.4
> Reporter: Ksenia Rybakova
> Priority: Major
> Attachments: ignite-base-load-config.xml, run-load.properties,
> run-load.xml
>
>
> Some client nodes fail with "Failed to process invalid partitions response"
> during failover test:
> {noformat}
> [2018-01-30 16:27:58,610][INFO ][sys-#190][GridDhtPartitionsExchangeFuture]
> Received full message, will finish exchange
> [node=80ebd2ac-1432-4bfc-bab7-d9dbf56cdeb4, resVer=AffinityTopologyVersion
> [topVer=37, minorTopVer=0]]
> [2018-01-30 16:27:58,688][INFO ][sys-#190][GridDhtPartitionsExchangeFuture]
> Finish exchange future [startVer=AffinityTopologyVersion [topVer=37,
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=37, minorTopVer=0],
> err=null]
> <16:27:58><benchmark-worker-32><yardstick> The benchmark of random operation
> failed.
> javax.cache.CacheException: class org.apache.ignite.IgniteCheckedException:
> Failed to process invalid partitions response (remote node reported invalid
> partitions but remote topology version does not differ from local)
> [topVer=AffinityTopologyVersion [topVer=37, minorTopVer=0],
> rmtTopVer=AffinityTopologyVersion [topVer=37, minorTopVer=0], part=204,
> nodeId=80ebd2ac-1432-4bfc-bab7-d9dbf56cdeb4]
> at
> org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1294)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:1673)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:852)
> at
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:676)
> at
> org.apache.ignite.yardstick.cache.load.IgniteCacheRandomOperationBenchmark.doGet(IgniteCacheRandomOperationBenchmark.java:776)
> at
> org.apache.ignite.yardstick.cache.load.IgniteCacheRandomOperationBenchmark.executeRandomOperation(IgniteCacheRandomOperationBenchmark.java:624)
> at
> org.apache.ignite.yardstick.cache.load.IgniteCacheRandomOperationBenchmark.executeOutOfTx(IgniteCacheRandomOperationBenchmark.java:602)
> at
> org.apache.ignite.yardstick.cache.load.IgniteCacheRandomOperationBenchmark.test(IgniteCacheRandomOperationBenchmark.java:207)
> at
> org.yardstickframework.impl.BenchmarkRunner$2.run(BenchmarkRunner.java:178)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to process
> invalid partitions response (remote node reported invalid partitions but
> remote topology version does not differ from local)
> [topVer=AffinityTopologyVersion [topVer=37, minorTopVer=0],
> rmtTopVer=AffinityTopologyVersion [topVer=37, minorTopVer=0], part=204,
> nodeId=80ebd2ac-1432-4bfc-bab7-d9dbf56cdeb4]
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.checkError(GridPartitionedSingleGetFuture.java:596)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.onResult(GridPartitionedSingleGetFuture.java:505)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processNearSingleGetResponse(GridDhtCacheAdapter.java:349)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1400(GridDhtAtomicCache.java:130)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$15.apply(GridDhtAtomicCache.java:422)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$15.apply(GridDhtAtomicCache.java:417)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090)
> at
> org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:499)
> ... 1 more
> Finishing main test [ts=1517318878863, date=Tue Jan 30 16:27:58 MSK 2018]
> ERROR: Shutting down benchmark driver to unexpected exception.
> {noformat}
> Test config:
> CacheRandomOperationBenchmark
> - 20 server nodes, 10 client nodes at 10 hosts
> - 34 caches with different configs with and without PDS, 3 backups
> - preload amount 250
> - key range 500K
> - operations: PUT PUT_ALL GET GET_ALL INVOKE INVOKE_ALL REMOVE REMOVE_ALL
> PUT_IF_ABSENT REPLACE
> - 2 of 20 servers are being restarted every 15 minutes
> Complete yardstick configs are attached.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)