Hello! * The message that you provided most likely means there's a prolonged deadlock or some other severe problem that made your cluster inoperational. * You are using putAll which is known to cause deadlocks if used in parallel and maps in question are not sorted. * Is the map that you're passing to putAll sorted? (e.g. TreeMap)
Regards, -- Ilya Kasnacheev 2018-05-15 19:33 GMT+03:00 mmyoussef <muhammad.ma...@teradata.com>: > The scenario is > 1- Data is loaded from cassandra to a lookup cache -> replicated cache > 2- Data is fetched from kafka input topic to a map > 3- the map is inserted - using putall method - to another cache -> > partitioned cache > 4- a complex join is done between both caches > 5- The result set is processed and inserted into layer 2 cache > 6- Then processed again to a layer 3 cache then to kafka output topic > > the jar is compiled and run on 2 different nodes. > > some records are fetched, processing runs on both nodes -results of the > first batch are not accurate- and then hangs and keeps logging this message > ?! > > 10/05/2018 16:39:48 INFO [tcp-disco-stats-printer-#4%cluster-VEON%] > TcpDiscoverySpi: Discovery SPI statistics [statistics=TcpDiscoveryStatistics > > [joinStartedTs=1525962688220, joinFinishedTs=1525962706508, > crdSinceTs=1525962706498, joinedNodesCnt=1, failedNodesCnt=0, > leftNodesCnt=0, > ackTimeoutsCnt=0, sockTimeoutsCnt=0, rcvdMsgs= > {TcpDiscoveryMetricsUpdateMessage=474, TcpDiscoveryNodeAddedMessage=1, > TcpDiscoveryNodeAddFinishedMessage=1, TcpDiscoveryJoinRequestMessage=2, > TcpDiscoveryDiscardMessage=35, TcpDiscoveryConnectionCheckMessage=706, > TcpDiscoveryCustomEventMessage=39}, procMsgs= > {TcpDiscoveryMetricsUpdateMessage=715, TcpDiscoveryNodeAddedMessage=1, > TcpDiscoveryNodeAddFinishedMessage=1, TcpDiscoveryJoinRequestMessage=2, > TcpDiscoveryCustomEventMessage=53, TcpDiscoveryDiscardMessage=76}, > avgMsgsSndTimes={TcpDiscoveryMetricsUpdateMessage=0, > TcpDiscoveryNodeAddedMessage=10, TcpDiscoveryNodeAddFinishedMessage=0, > TcpDiscoveryJoinRequestMessage=10, TcpDiscoveryConnectionCheckMessage=0, > TcpDiscoveryDiscardMessage=0, TcpDiscoveryCustomEventMessage=0}, > maxMsgsSndTimes={TcpDiscoveryMetricsUpdateMessage=10, > TcpDiscoveryNodeAddedMessage=10, TcpDiscoveryNodeAddFinishedMessage=0, > TcpDiscoveryJoinRequestMessage=10, TcpDiscoveryConnectionCheckMessage=42, > TcpDiscoveryDiscardMessage=0, TcpDiscoveryCustomEventMessage=0}, > sentMsgs= > {TcpDiscoveryMetricsUpdateMessage=474, TcpDiscoveryNodeAddedMessage=1, > TcpDiscoveryNodeAddFinishedMessage=1, TcpDiscoveryJoinRequestMessage=1, > TcpDiscoveryConnectionCheckMessage=706, TcpDiscoveryDiscardMessage=35, > TcpDiscoveryCustomEventMessage=33}, avgMsgsAckTimes= > {TcpDiscoveryMetricsUpdateMessage=0, TcpDiscoveryNodeAddedMessage=10, > TcpDiscoveryNodeAddFinishedMessage=0, TcpDiscoveryJoinRequestMessage=10, > TcpDiscoveryConnectionCheckMessage=0, TcpDiscoveryDiscardMessage=0, > TcpDiscoveryCustomEventMessage=0}, maxMsgsAckTimes= > {TcpDiscoveryMetricsUpdateMessage=10, TcpDiscoveryNodeAddedMessage=10, > TcpDiscoveryNodeAddFinishedMessage=0, TcpDiscoveryJoinRequestMessage=10, > TcpDiscoveryConnectionCheckMessage=42, TcpDiscoveryDiscardMessage=0, > TcpDiscoveryCustomEventMessage=0}, avgMsgQueueTime=0, maxMsgQueueTime=11, > ringMsgsSent=35, avgRingMsgTime=0, maxRingMsgTime=121, > maxRingTimeMsgCls=TcpDiscoveryNodeAddedMessage, avgMsgProcTime=0, > maxMsgProcTime=91, maxProcTimeMsgCls=TcpDiscoveryJoinRequestMessage, > sockReadersCreated=3, sockReadersRmv=2, avgSrvSockInitTime=3, > maxSrvSockInitTime=10, clientSockCreatedCnt=2, avgClientSockInitTime=15, > maxClientSockInitTime=21, pendingMsgsRegistered=35, > pendingMsgsDiscarded=0], > spiState=CONNECTED, coord=TcpDiscoveryNode [id=8a03f4a3-a631-42e7-bdf2- > b062e1e977b6, addrs=[0:0:0:0:0:0:0:1, 10.19.1.14, 127.0.0.1, > 172.16.44.139, > 172.23.48.1], sockAddrs=[WEGMM250132-O5F.TD.TERADATA.COM/10.19.1.14:47501, > > /172.16.44.139:47501, /172.23.48.1:47501, /0:0:0:0:0:0:0:1:47501, > /127.0.0.1:47501], discPort=47501, order=1, intOrder=1, > lastExchangeTime=1525962688210, loc=true, ver=2.0.0#20170430-sha1:d4eef3c6, > > isClient=false], next=TcpDiscoveryNode [id=ee98d2ff-40fe-4981-8a59- > 1b12b478bc11, addrs=[0:0:0:0:0:0:0:1, 10.19.1.14, 127.0.0.1, > 172.16.44.139, > 172.23.48.1], sockAddrs=[WEGMM250132-O5F.TD.TERADATA.COM/10.19.1.14:47500, > > /172.23.48.1:47500, /0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500, > /172.16.44.139:47500], discPort=47500, order=2, intOrder=2, > lastExchangeTime=1525962714266, loc=false, ver=2.0.0#20170430-sha1:d4eef3c6, > > isClient=false], intOrder=1, topSize=2, leavingNodesSize=0, > failedNodesSize=0, > joiningNodesSize=0, pendingCustomMsgs=0, msgWorker.queue.size=0, > clients=0, > clientWorkers=0, lastUpdate=05/10/2018 16:39:47, heapFree=5727M, > heapTotal=5888M] > > > > some times I get this message too : > [GridCachePartitionExchangeManager] Failed to wait for initial partition > map > exchange. Possible reasons are: > ^-- Transactions in deadlock. > ^-- Long running transactions (ignore if this is the case). > ^-- Unreleased explicit locks. > > How can I solve this in your opinions? > > > > -- > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ >