Hello all, I tried posting to the user list but I am still not seeing the email after an hour.
I have a 5 node cluster where I "lost" a node temporarily (Amazon reported a hardware error so my Ops guys shut down the instance and brought a new one back up). I ran the same ignite.sh configuration on the new node, expecting it to join the cluster - however, I am seeing the following in the logs (see below). In addition, I cannot access the caches anymore from my code - connecting to a cache via getOrCreateCache() just hangs and eventually times out. The cluster still has 4 members so I am not quite sure what is going on. To add to this - I can cache -scan the caches from visor and all the information is still there, however, inaccessible from my code (with client mode on or off, doesn't matter). I am baffled. Thanks! Ognen [23:10:38,923][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture] Retrying preload partition exchange due to timeout [done=false, dummy=false, exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0], nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3, bd33def3, f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3, f7cc4da6, 8eda3172, efef2202], init=true, initFut=true, ready=true, replied=false, added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0, locNodeOrder=1848, locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae] [23:10:53,699][WARNING][main][GridCachePartitionExchangeManager] Failed to wait for initial partition map exchange. Possible reasons are: ^-- Transactions in deadlock. ^-- Long running transactions (ignore if this is the case). ^-- Unreleased explicit locks. [23:10:53,926][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture] Retrying preload partition exchange due to timeout [done=false, dummy=false, exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0], nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3, bd33def3, f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3, f7cc4da6, 8eda3172, efef2202], init=true, initFut=true, ready=true, replied=false, added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0, locNodeOrder=1848, locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae] [23:11:08,929][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture] Retrying preload partition exchange due to timeout [done=false, dummy=false, exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0], nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3, bd33def3, f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3, f7cc4da6, 8eda3172, efef2202], init=true, initFut=true, ready=true, replied=false, added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0, locNodeOrder=1848, locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae] [....] [repeated many, many times] [....]
