[ https://issues.apache.org/jira/browse/IGNITE-12894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091560#comment-17091560 ]
Vyacheslav Daradur edited comment on IGNITE-12894 at 4/24/20, 1:16 PM: ----------------------------------------------------------------------- Both issues: this and IGNITE-12490 can be fixed by improvements of our deployment guarantees, read this for details: [dev-list-thread|http://apache-ignite-developers.2346864.n4.nabble.com/Discovery-based-services-deployment-guarantees-question-td44866.html] The main idea is allowing [GridServiceProxy#randomNodeForService|https://github.com/apache/ignite/blob/8cba313c9961b16e358834216e9992310f285985/modules/core/src/main/java/org/apache/ignite/internal/processors/service/GridServiceProxy.java#L283] to wait service deployment finished if it is registered in the cluster (but deployment has not finished yet). It can be achieved in the same manner as for our ["API with a timeout" here|https://github.com/apache/ignite/blob/8dcd0f1d96dae965a0f5c479e6d0f4b4d50c6e2c/modules/core/src/main/java/org/apache/ignite/internal/processors/service/IgniteServiceProcessor.java#L821http://example.com] (mentioned as a workaround in current issue description). Need add some conditions, something like this: {code:java} IgniteUuid srvcUid = lookupRegisteredServiceId(name); if (srvcUid == null) return null; // Service is not registered in cluster: wasn't present in cfg and didn't deployed through API while (true) { ServiceInfo srvcDesc = registeredServices.get(srvcUid); if (srvcDesc == null) { if (timeout == 0) return null; else // Wait if someone sent service to deploy (as in current implementation) } if (!srvcDesc.topologySnapshot().isEmpty()) { return top; } // Wait using "servicesTopsUpdateMux" while service deployment finished and the topology will not be empty // or removed from "registeredServices" in case if deployment failure {code} was (Author: daradurvs): Both issues: this and IGNITE-12490 can be fixed by improvements of our deployment guarantees, read this for details: [dev-list-thread|http://apache-ignite-developers.2346864.n4.nabble.com/Discovery-based-services-deployment-guarantees-question-td44866.html] The main idea is allowing [GridServiceProxy#randomNodeForService|https://github.com/apache/ignite/blob/8cba313c9961b16e358834216e9992310f285985/modules/core/src/main/java/org/apache/ignite/internal/processors/service/GridServiceProxy.java#L283] to wait service deployment finished if it is registered in the cluster (but deployment has not finished yet). It can be achieved in the same manner as for our ["API with a timeout" here|https://github.com/apache/ignite/blob/8dcd0f1d96dae965a0f5c479e6d0f4b4d50c6e2c/modules/core/src/main/java/org/apache/ignite/internal/processors/service/IgniteServiceProcessor.java#L821http://example.com] (mentioned as a workaround in current issue description). Need add some conditions, something like this: {code:java} IgniteUuid srvcUid = lookupRegisteredServiceId(name); if (srvcUid == null) return null; // Service is not registered in cluster: wasn't present in cfg and didn't deployed through API Map<UUID, Integer> top; while (true) { ServiceInfo srvcDesc = registeredServices.get(srvcUid); if (srvcDesc == null) { if (timeout == 0) return null; else // Wait if someone sent service to deploy (as in current implementation) } top = srvcDesc.topologySnapshot(); if (!top.isEmpty()) { return top; } // Wait using "servicesTopsUpdateMux" while service deployment finished and the topology will not be empty // or removed from "registeredServices" in case if deployment failure {code} > Cannot use IgniteAtomicSequence in Ignite services > -------------------------------------------------- > > Key: IGNITE-12894 > URL: https://issues.apache.org/jira/browse/IGNITE-12894 > Project: Ignite > Issue Type: Bug > Components: compute > Affects Versions: 2.8 > Reporter: Alexey Kukushkin > Assignee: Mikhail Petrov > Priority: Major > Labels: sbcf > > h2. Repro Steps > Execute the below steps in default service deployment mode and in > discovery-based service deployment mode. > Use {{-DIGNITE_EVENT_DRIVEN_SERVICE_PROCESSOR_ENABLED=true}} JVM option to > switch to the discovery-based service deployment mode. > * Create a service initializing an {{IgniteAtomicService}} in method > {{Service#init()}} and using the {{IgniteAtomicService}} in a business method. > * Start an Ignite node with the service specified in the IgniteConfiguration > * Invoke the service's business method on the Ignite node > h3. Actual Result > h4. In Default Service Deployment Mode > Deadlock on the business method invocation > h4. In Discovery-Based Service Deployment Mode > The method invocation fails with {{IgniteException: Failed to find deployed > service: IgniteTestService}} > h2. Reproducer > h3. Test.java > {code:java} > public interface Test { > String sayHello(String name); > } > {code} > h3. IgniteTestService.java > {code:java} > public class IgniteTestService implements Test, Service { > private @IgniteInstanceResource Ignite ignite; > private IgniteAtomicSequence seq; > @Override public void cancel(ServiceContext ctx) { > } > @Override public void init(ServiceContext ctx) throws > InterruptedException { > seq = ignite.atomicSequence("TestSeq", 0, true); > } > @Override public void execute(ServiceContext ctx) { > } > @Override public String sayHello(String name) { > return "Hello, " + name + "! #" + seq.getAndIncrement(); > } > } > {code} > h3. Reproducer.java > {code:java} > public class Reproducer { > public static void main(String[] args) { > IgniteConfiguration igniteCfg = new IgniteConfiguration() > .setServiceConfiguration( > new ServiceConfiguration() > .setName(IgniteTestService.class.getSimpleName()) > .setMaxPerNodeCount(1) > .setTotalCount(0) > .setService(new IgniteTestService()) > ) > .setDiscoverySpi( > new TcpDiscoverySpi() > .setIpFinder(new > TcpDiscoveryVmIpFinder().setAddresses(Collections.singleton("127.0.0.1:47500"))) > ); > try (Ignite ignite = Ignition.start(igniteCfg)) { > > ignite.services().serviceProxy(IgniteTestService.class.getSimpleName(), > Test.class, false) > .sayHello("World"); > } > } > } > {code} > h2. Workaround > Specifying a service wait timeout solves the problem in the discovery-based > service deployment mode (but not in the default deployment mode): > {code:java} > > ignite.services().serviceProxy(IgniteTestService.class.getSimpleName(), > Test.class, false, 1_000) > .sayHello("World"); > {code} > This workaround cannot be used in Ignite.NET clients since .NET > {{GetServiceProxy}} API does not support the service wait timeout, which is > hard-coded to 0 on the server side. > h2. Full Exception in Discovery-Based Service Deployment Mode > {noformat} > [01:08:54,653][SEVERE][services-deployment-worker-#52][IgniteServiceProcessor] > Failed to initialize service (service will not be deployed): > IgniteTestService > class org.apache.ignite.IgniteInterruptedException: Got interrupted while > waiting for future to complete. > at > org.apache.ignite.internal.util.IgniteUtils$3.apply(IgniteUtils.java:888) > at > org.apache.ignite.internal.util.IgniteUtils$3.apply(IgniteUtils.java:886) > at > org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:1062) > at > org.apache.ignite.internal.IgniteKernal.atomicSequence(IgniteKernal.java:3999) > at > org.apache.ignite.internal.IgniteKernal.atomicSequence(IgniteKernal.java:3985) > at Sandbox.Net.IgniteTestService.init(IgniteTestService.java:17) > at > org.apache.ignite.internal.processors.service.IgniteServiceProcessor.redeploy(IgniteServiceProcessor.java:1188) > at > org.apache.ignite.internal.processors.service.ServiceDeploymentTask.lambda$processDeploymentActions$5(ServiceDeploymentTask.java:318) > at java.base/java.util.HashMap.forEach(HashMap.java:1336) > at > org.apache.ignite.internal.processors.service.ServiceDeploymentTask.processDeploymentActions(ServiceDeploymentTask.java:302) > at > org.apache.ignite.internal.processors.service.ServiceDeploymentTask.init(ServiceDeploymentTask.java:262) > at > org.apache.ignite.internal.processors.service.ServiceDeploymentManager$ServicesDeploymentWorker.body(ServiceDeploymentManager.java:475) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.base/java.lang.Thread.run(Thread.java:834) > [01:08:54,712][SEVERE][exchange-worker-#42][GridDhtPartitionsExchangeFuture] > Failed to reinitialize local partitions (rebalancing will be stopped): > GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=1, > minorTopVer=1], discoEvt=DiscoveryCustomEvent > [customMsg=DynamicCacheChangeBatch > [id=17576957171-7ae549c8-423a-40b4-9865-c28a2f4b9dd9, reqs=ArrayList > [DynamicCacheChangeRequest > [cacheName=ignite-sys-atomic-cache@default-ds-group, hasCfg=true, > nodeId=5fe32117-84ee-4f1f-9e19-86b85ef8c987, clientStartOnly=false, > stop=false, destroy=false, disabledAfterStartfalse]], > exchangeActions=ExchangeActions > [startCaches=[ignite-sys-atomic-cache@default-ds-group], stopCaches=null, > startGrps=[default-ds-group], stopGrps=[], resetParts=null, > stateChangeRequest=null], startCaches=false], > affTopVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], > super=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=5fe32117-84ee-4f1f-9e19-86b85ef8c987, > consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.1.2,192.168.56.1:47500, > addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.1.2, 192.168.56.1], > sockAddrs=HashSet [kukushal-pc/172.22.44.97:47500, /0:0:0:0:0:0:0:1:47500, > /127.0.0.1:47500, /192.168.56.1:47500, /192.168.1.2:47500], discPort=47500, > order=1, intOrder=1, lastExchangeTime=1586815734079, loc=true, > ver=2.8.0#20200226-sha1:341b01df, isClient=false], topVer=1, > nodeId8=5fe32117, msg=null, type=DISCOVERY_CUSTOM_EVT, > tstamp=1586815734517]], nodeId=5fe32117, evt=DISCOVERY_CUSTOM_EVT] > class org.apache.ignite.IgniteException: Failed to validate partitions state > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.validatePartitionsState(GridDhtPartitionsExchangeFuture.java:3886) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.finishExchangeOnCoordinator(GridDhtPartitionsExchangeFuture.java:3577) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAllReceived(GridDhtPartitionsExchangeFuture.java:3485) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1610) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:891) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3172) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3021) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: class > org.apache.ignite.internal.IgniteInterruptedCheckedException: null > at > org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11189) > at > org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11059) > at > org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11039) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.validatePartitionsState(GridDhtPartitionsExchangeFuture.java:3848) > ... 8 more > Caused by: java.lang.InterruptedException > at > java.base/java.util.concurrent.FutureTask.awaitDone(FutureTask.java:418) > at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:190) > at > org.apache.ignite.internal.util.IgniteUtils$Batch.result(IgniteUtils.java:11313) > at > org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11179) > ... 11 more > [01:08:54,720][SEVERE][exchange-worker-#42][GridCachePartitionExchangeManager] > Failed to wait for completion of partition map exchange (preloading will not > start): GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryCustomEvent > [customMsg=null, affTopVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], > super=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=5fe32117-84ee-4f1f-9e19-86b85ef8c987, > consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.1.2,192.168.56.1:47500, > addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.1.2, 192.168.56.1], > sockAddrs=HashSet [kukushal-pc/172.22.44.97:47500, /0:0:0:0:0:0:0:1:47500, > /127.0.0.1:47500, /192.168.56.1:47500, /192.168.1.2:47500], discPort=47500, > order=1, intOrder=1, lastExchangeTime=1586815734079, loc=true, > ver=2.8.0#20200226-sha1:341b01df, isClient=false], topVer=1, > nodeId8=5fe32117, msg=null, type=DISCOVERY_CUSTOM_EVT, > tstamp=1586815734517]], crd=TcpDiscoveryNode > [id=5fe32117-84ee-4f1f-9e19-86b85ef8c987, > consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.1.2,192.168.56.1:47500, > addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.1.2, 192.168.56.1], > sockAddrs=HashSet [kukushal-pc/172.22.44.97:47500, /0:0:0:0:0:0:0:1:47500, > /127.0.0.1:47500, /192.168.56.1:47500, /192.168.1.2:47500], discPort=47500, > order=1, intOrder=1, lastExchangeTime=1586815734079, loc=true, > ver=2.8.0#20200226-sha1:341b01df, isClient=false], > exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=1, > minorTopVer=1], discoEvt=DiscoveryCustomEvent [customMsg=null, > affTopVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], > super=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=5fe32117-84ee-4f1f-9e19-86b85ef8c987, > consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.1.2,192.168.56.1:47500, > addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.1.2, 192.168.56.1], > sockAddrs=HashSet [kukushal-pc/172.22.44.97:47500, /0:0:0:0:0:0:0:1:47500, > /127.0.0.1:47500, /192.168.56.1:47500, /192.168.1.2:47500], discPort=47500, > order=1, intOrder=1, lastExchangeTime=1586815734079, loc=true, > ver=2.8.0#20200226-sha1:341b01df, isClient=false], topVer=1, > nodeId8=5fe32117, msg=null, type=DISCOVERY_CUSTOM_EVT, > tstamp=1586815734517]], nodeId=5fe32117, evt=DISCOVERY_CUSTOM_EVT], > added=true, exchangeType=ALL, initFut=GridFutureAdapter > [ignoreInterrupts=false, state=DONE, res=true, hash=429760908], init=false, > lastVer=null, partReleaseFut=PartitionReleaseFuture > [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], > futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion [topVer=1, > minorTopVer=1], futures=[]], AtomicUpdateReleaseFuture > [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], futures=[]], > DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=1, > minorTopVer=1], futures=[]], LocalTxReleaseFuture > [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], futures=[]], > AllTxReleaseFuture [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], > futures=[RemoteTxReleaseFuture [topVer=AffinityTopologyVersion [topVer=1, > minorTopVer=1], futures=[]]]]]], exchActions=ExchangeActions > [startCaches=[ignite-sys-atomic-cache@default-ds-group], stopCaches=null, > startGrps=[default-ds-group], stopGrps=[], resetParts=null, > stateChangeRequest=null], affChangeMsg=null, centralizedAff=false, > forceAffReassignment=false, exchangeLocE=null, > cacheChangeFailureMsgSent=false, done=true, state=CRD, > registerCachesFuture=GridFinishedFuture [resFlag=2], partitionsSent=false, > partitionsReceived=false, delayedLatestMsg=null, > afterLsnrCompleteFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, > res=null, hash=583816633], timeBag=o.a.i.i.util.TimeBag@5ac0d023, > startTime=1087079935840199, initTime=1586815734527, rebalanced=false, > evtLatch=0, remaining=HashSet [], mergedJoinExchMsgs=null, awaitMergedMsgs=0, > super=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=class > o.a.i.IgniteException: Failed to validate partitions state, hash=1371010775]] > class org.apache.ignite.IgniteCheckedException: Failed to validate partitions > state > at > org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7509) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.resolve(GridFutureAdapter.java:260) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:209) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:160) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3200) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3021) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > Caused by: class > org.apache.ignite.internal.IgniteInterruptedCheckedException: null > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: class org.apache.ignite.IgniteException: Failed to validate > partitions state > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.validatePartitionsState(GridDhtPartitionsExchangeFuture.java:3886) > Caused by: java.lang.InterruptedException > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.finishExchangeOnCoordinator(GridDhtPartitionsExchangeFuture.java:3577) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAllReceived(GridDhtPartitionsExchangeFuture.java:3485) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1610) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:891) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3172) > ... 3 more > Caused by: class > org.apache.ignite.internal.IgniteInterruptedCheckedException: null > at > org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11189) > Caused by: class org.apache.ignite.IgniteException: Failed to validate > partitions state > at > org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11059) > at > org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11039) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.validatePartitionsState(GridDhtPartitionsExchangeFuture.java:3848) > ... 8 more > Caused by: java.lang.InterruptedException > at > java.base/java.util.concurrent.FutureTask.awaitDone(FutureTask.java:418) > at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:190) > at > org.apache.ignite.internal.util.IgniteUtils$Batch.result(IgniteUtils.java:11313) > at > org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11179) > ... 11 more > Caused by: class > org.apache.ignite.internal.IgniteInterruptedCheckedException: null > Caused by: java.lang.InterruptedException > [01:08:54] Ignite node stopped OK [uptime=00:00:00.219] > Exception in thread "main" class org.apache.ignite.IgniteException: Failed to > find deployed service: IgniteTestService > at > org.apache.ignite.internal.processors.service.GridServiceProxy.invokeMethod(GridServiceProxy.java:169) > at > org.apache.ignite.internal.processors.service.GridServiceProxy$ProxyInvocationHandler.invoke(GridServiceProxy.java:364) > at com.sun.proxy.$Proxy25.sayHello(Unknown Source) > at Sandbox.Net.Reproducer.main(Reproducer.java:29) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)