Alex & Val, I reviewed your changes and they seem good to me. Good catches! However, TC state is not acceptable. I am afraid you touched some fundamentals of TCP discovery:
06:06:31]W: [org.apache.ignite:ignite-core] [06:06:31,976][ERROR][tcp-disco-msg-worker-#16%atomic.GridCacheValueConsistencyAtomicSelfTest2][TcpDiscoverySpi] Runtime error caught during grid runnable execution: IgniteSpiThread [name=tcp-disco-msg-worker-#16%atomic.GridCacheValueConsistencyAtomicSelfTest2] [06:06:31]W: [org.apache.ignite:ignite-core] java.lang.AssertionError: Topology version has not been updated: [ring=TcpDiscoveryNodesRing [locNode=TcpDiscoveryNode [id=20776610-8b3e-4bd4-92f6-8da1f618d002, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47502], discPort=47502, order=3, intOrder=3, lastExchangeTime=1442631991969, loc=true, ver=1.4.0#19700101-sha1:00000000, isClient=false], nodes=[TcpDiscoveryNode [id=00b46a2e-f5fc-4dfc-bfb0-9042f2da0000, addrs=[127.0.0.1], sockAddrs=[/ 127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1442631951804, loc=false, ver=1.4.0#19700101-sha1:00000000, isClient=false], TcpDiscoveryNode [id=10002b17-a015-4a12-aad1-7829b80bb001, addrs=[127.0.0.1], sockAddrs=[/ 127.0.0.1:47501], discPort=47501, order=2, intOrder=2, lastExchangeTime=1442631951804, loc=false, ver=1.4.0#19700101-sha1:00000000, isClient=false], TcpDiscoveryNode [id=20776610-8b3e-4bd4-92f6-8da1f618d002, addrs=[127.0.0.1], sockAddrs=[/ 127.0.0.1:47502], discPort=47502, order=3, intOrder=3, lastExchangeTime=1442631991969, loc=true, ver=1.4.0#19700101-sha1:00000000, isClient=false], TcpDiscoveryNode [id=30a3b148-11f5-4b21-ab65-8912748b3003, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, lastExchangeTime=1442631991907, loc=false, ver=1.4.0#19700101-sha1:00000000, isClient=false]], topVer=5, nodeOrder=4], msg=TcpDiscoveryNodeAddFinishedMessage [nodeId=30a3b148-11f5-4b21-ab65-8912748b3003, super=TcpDiscoveryAbstractMessage [sndNodeId=00b46a2e-f5fc-4dfc-bfb0-9042f2da0000, id=bb9a093ef41-00b46a2e-f5fc-4dfc-bfb0-9042f2da0000, verifierNodeId=00b46a2e-f5fc-4dfc-bfb0-9042f2da0000, topVer=4, pendingIdx=0, isClient=false]], lastMsg=TcpDiscoveryNodeLeftMessage [super=TcpDiscoveryAbstractMessage [sndNodeId=10002b17-a015-4a12-aad1-7829b80bb001, id=b005ec3ef41-30a3b148-11f5-4b21-ab65-8912748b3003, verifierNodeId=00b46a2e-f5fc-4dfc-bfb0-9042f2da0000, topVer=5, pendingIdx=0, isClient=false]], spiState=CONNECTED] [06:06:31]W: [org.apache.ignite:ignite-core] at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddFinishedMessage(ServerImpl.java:3371) [06:06:31]W: [org.apache.ignite:ignite-core] at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:1994) [06:06:31]W: [org.apache.ignite:ignite-core] at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:5268) [06:06:31]W: [org.apache.ignite:ignite-core] at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) [06:11:40]W: [org.apache.ignite:ignite-core] [06:11:40,190][ERROR][tcp-disco-msg-worker-#1490%replicated.GridCacheSyncReplicatedPreloadSelfTest7][TcpDiscoverySpi] Runtime error caught during grid runnable execution: IgniteSpiThread [name=tcp-disco-msg-worker-#1490%replicated.GridCacheSyncReplicatedPreloadSelfTest7] [06:11:40]W: [org.apache.ignite:ignite-core] java.lang.AssertionError [06:11:40]W: [org.apache.ignite:ignite-core] at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processCustomMessage(ServerImpl.java:4180) [06:11:40]W: [org.apache.ignite:ignite-core] at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2015) [06:11:40]W: [org.apache.ignite:ignite-core] at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:5268) [06:11:40]W: [org.apache.ignite:ignite-core] at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) [06:11:40]W: [org.apache.ignite:ignite-core] Exception in thread "tcp-disco-msg-worker-#1490%replicated.GridCacheSyncReplicatedPreloadSelfTest7" java.lang.AssertionError [06:11:40]W: [org.apache.ignite:ignite-core] at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processCustomMessage(ServerImpl.java:4180) [06:11:40]W: [org.apache.ignite:ignite-core] at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2015) [06:11:40]W: [org.apache.ignite:ignite-core] at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:5268) [06:11:40]W: [org.apache.ignite:ignite-core] at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) -- Yakov Zhdanov, Director R&D *GridGain Systems* www.gridgain.com 2015-09-19 4:07 GMT+03:00 Alexey Goncharuk <[email protected]>: > Yakov, > > Valentin and I debugged the issue with ignite-1171 and I think we got to > the bottom of it. First of all, pending messages were not reset to the > correct collection on joining node which resulted in skipped custom event > notifications. Second, the check that you have added to avoid discarding of > custom message was checking wrong variable and wrong type :) After we fixed > those two issues, the test seem to pass. Please review my changes again. > > --AG > > 2015-09-18 14:10 GMT-07:00 Yakov Zhdanov <[email protected]>: > > > Igniters, > > > > While working on ignite-1171 we discovered couple more issues in > discovery > > that might have threaten custom events processing under some > circumstances > > (we have continuous processes based on this logic, for example). > > > > Alexey Goncharuk has picked this up. > > > > Another critical issue discovered today - > > https://issues.apache.org/jira/browse/IGNITE-1516 - performance drop in > > offheap query benchmark. Semyon will be fixing it. > > > > https://issues.apache.org/jira/browse/IGNITE-973 - Sergi has come to > > conclusion that race still present in cache offheap swap logic. Currently > > this is assigned to Semyon, too. > > > > We need to postpone release till very beginning of next week. > > > > --Yakov > > > > 2015-09-18 12:01 GMT+03:00 Yakov Zhdanov <[email protected]>: > > > > > Alex, I think that your approach with delaying custom message will > work. > > > As far as coordinator crash protection, we guarantee delivery of > certain > > > messages types (including custom message). This logic was implemented > > long > > > ago and seems to work. So, the message just gets resent. > > > > > > Semyon, can you please take a look at Alex's changes? > > > > > > --Yakov > > > > > > 2015-09-18 3:24 GMT+03:00 Alexey Goncharuk <[email protected] > >: > > > > > >> Yakov, > > >> > > >> The approach with collecting discovery data on NodeAddFinished message > > >> does > > >> not work because this messages get relayed to clients before the > message > > >> passes the whole ring. If we make it to pass the ring and relay it to > > >> clients on the second round, we get the same race as I was fixing. > > >> > > >> I think the correct approach here is to delay custom event messages > when > > >> node join is in progress - basically do not allow custom messages > > between > > >> NodeAddedMessage and NodeAddFinished message. I implemented a very > > simple > > >> fix in ignite-1171, however I need you someone else with good > expertise > > in > > >> discovery protocol to take a look at my changes because I am sure I > > missed > > >> something - e.g. I am not sure how delayed messages should be handled > in > > >> case when coordinator node crashes. > > >> > > >> 2015-09-17 8:52 GMT-07:00 Yakov Zhdanov <[email protected]>: > > >> > > >> > Alex, I think it makes sense to continue investigating this. We can > > >> discuss > > >> > whether we include or skip the fix once fix is ready. > > >> > > > >> > As far as other tickets: > > >> > > > >> > > > >> > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%20ignite-1.4%20AND%20status%20!%3D%20closed%20ORDER%20BY%20assignee%20ASC%2C%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC > > >> > > > >> > IGNITE-1171 Getting affinity for topology version earlier than > > affinity > > >> is > > >> > calculated - is on Alex Goncharuk. > > >> > IGNITE-973 Failed to get value for key: 13791. at > > >> > > > >> > > > >> > > > o.a.i.i.processors.query.h2.opt.GridH2AbstractKeyValueRow.getValue(GridH2AbstractKeyValueRow.java:223) > > >> > - assigned to Sergi. There seems to be a problem in offheap indexing > > >> which > > >> > can be reproduced from time to time. This is an old issue and I > think > > >> can > > >> > be postponed if does not fit. > > >> > > > >> > +1 IGFS issue > > >> > and rest ver.x issues > > >> > > > >> > I hope IGNITE-1171 will be fixed today so picture become much > cleaner. > > >> > > > >> > -- > > >> > Yakov Zhdanov, Director R&D > > >> > *GridGain Systems* > > >> > www.gridgain.com > > >> > > > >> > 2015-09-17 0:59 GMT+03:00 Alexey Goncharuk < > > [email protected] > > >> >: > > >> > > > >> > > Yakov, Igniters, > > >> > > > > >> > > I have found at least one issue related to ignite-1171 hang, it is > > >> caused > > >> > > by a race between discovery custom message and > > collectDiscoveryData() > > >> > call > > >> > > (updated the ticket). I remember we wanted to call > > >> collectDiscoveryData() > > >> > > during the NodeAddFinishedMessage processing, however it was not > > >> > > implemented - do we think that this is a correct change and do we > > >> want it > > >> > > to be fixed in 1.4? Discovery changes are quite sensitive and I > > would > > >> > > prefer them to be tested thoroughly. > > >> > > > > >> > > 2015-09-16 9:09 GMT-07:00 Yakov Zhdanov <[email protected]>: > > >> > > > > >> > > > Guys, > > >> > > > > > >> > > > I want to update release status. > > >> > > > > > >> > > > Testing has revealed some cache issues which should be fixed > with > > >> the > > >> > > > release. Moreover, it turned out that these issues block vert.x > > >> > release. > > >> > > > So, if we fix them we can consider including vert.x into 1.4 > > >> release. > > >> > > Which > > >> > > > is good I think. > > >> > > > > > >> > > > I think that Alex Goncharuk is the best person who can look into > > >> vert.x > > >> > > > issues. Alex, please first of all pay attention to IGNITE-1171 - > > >> > Getting > > >> > > > affinity for topology version earlier than affinity is > calculated > > - > > >> > Test > > >> > > > reproducing the issue has been added to ignite1.4. Alex please > let > > >> us > > >> > > know > > >> > > > if this can be fixed. > > >> > > > > > >> > > > These issues are on Semyon Boikov: > > >> > > > > > >> > > > IGNITE-973 Failed to get value for key: 13791. at > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > o.a.i.i.processors.query.h2.opt.GridH2AbstractKeyValueRow.getValue(GridH2AbstractKeyValueRow.java:223) > > >> > > > - We need more time to finish with this. Some race in swap is > > still > > >> > > there. > > >> > > > IGNITE-1452 OptimizedMarshaller.unmarshal hangs in > > >> > > > IgniteCacheQueryNodeRestartSelfTest2 - Need to check TC and > merge. > > >> > > > > > >> > > > Rest of tickets are vert.x related. Here is the link - > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%20ignite-1.4%20AND%20status%20!%3D%20closed%20ORDER%20BY%20assignee%20ASC%2C%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC > > >> > > > > > >> > > > Andrey Gura, please provide as much information as you can for > the > > >> rest > > >> > > of > > >> > > > vert.x tickets. > > >> > > > > > >> > > > Thanks! > > >> > > > > > >> > > > --Yakov > > >> > > > > > >> > > > 2015-09-15 19:12 GMT+03:00 Yakov Zhdanov <[email protected]>: > > >> > > > > > >> > > > > Raul, how is your status with the streamer? I think there is > no > > >> > reason > > >> > > > for > > >> > > > > rush. We can put it to 1.5. Please let me know what you think. > > >> > > > > > > >> > > > > As far as release status here are the open tickets - > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%20ignite-1.4%20AND%20status%20!%3D%20closed%20ORDER%20BY%20assignee%20ASC%2C%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC > > >> > > > > > > >> > > > > https://issues.apache.org/jira/browse/IGNITE-1239 - Alex > > >> Goncharuk, > > >> > > can > > >> > > > > you please let us know if this will be finished today? > > >> > > > > https://issues.apache.org/jira/browse/IGNITE-1490 - Ilya > > Suntsov > > >> > works > > >> > > > on > > >> > > > > reproducing this. I suspect we may have problems with near > cache > > >> > > > evictions. > > >> > > > > Can Val or Alex proceed with this after Ilya finishes test > run? > > >> Ilya, > > >> > > > > please respond in ticket upon your results. > > >> > > > > > > >> > > > > Thanks! > > >> > > > > > > >> > > > > --Yakov > > >> > > > > > > >> > > > > 2015-09-15 11:15 GMT+03:00 Raul Kripalani <[email protected]>: > > >> > > > > > > >> > > > >> Hi guys, > > >> > > > >> > > >> > > > >> The MQTT streamer I'm working on will be ready this week. > > >> Hopefully > > >> > as > > >> > > > >> soon > > >> > > > >> as today or tomorrow. > > >> > > > >> > > >> > > > >> It's not important for the 1.4 release, but it seems like > it'll > > >> make > > >> > > the > > >> > > > >> timeline to get potentially merged. > > >> > > > >> > > >> > > > >> Regards, > > >> > > > >> Raúl. > > >> > > > >> On 15 Sep 2015 00:05, "Yakov Zhdanov" <[email protected]> > > >> wrote: > > >> > > > >> > > >> > > > >> > Guys, > > >> > > > >> > > > >> > > > >> > Current status is the following: > > >> > > > >> > > > >> > > > >> > 1. Sam needs to merge his fixes after TC is finished. > > >> > > > >> > 2. Some minor changes pending from Denis + release notes > fix > > >> > pointed > > >> > > > by > > >> > > > >> > Dmitry. > > >> > > > >> > 3. Several suites are still red on TC > > >> > > > >> > > > >> > > > >> > I have moved plenty of tickets to ignite-1.5. Here is the > > link > > >> to > > >> > > > >> currently > > >> > > > >> > open tickets that I want everyone (esp. assignees) to look > > >> through > > >> > > and > > >> > > > >> tell > > >> > > > >> > me whether ticket can be moved or should be fixed - > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > >> > > > > > >> > > > > >> > > > >> > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%20ignite-1.4%20AND%20status%20!%3D%20closed%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC > > >> > > > >> > > > >> > > > >> > Alex Goncharuk has 5 tickets. > > >> > > > >> > Semyon Boikov has 5 tickets. > > >> > > > >> > Valentin has 4 > > >> > > > >> > Sergi has 4 > > >> > > > >> > Vladimir has 3 > > >> > > > >> > Ivan V. has 3 > > >> > > > >> > > > >> > > > >> > Guys, please look your tickets through and let us know your > > >> > > decision. > > >> > > > >> > > > >> > > > >> > --Yakov > > >> > > > >> > > > >> > > > >> > 2015-09-14 21:04 GMT+03:00 Dmitriy Setrakyan < > > >> > [email protected] > > >> > > >: > > >> > > > >> > > > >> > > > >> > > Yakov, > > >> > > > >> > > > > >> > > > >> > > I know you were managing the 1.4 release. Can you please > > >> provide > > >> > > an > > >> > > > >> > update > > >> > > > >> > > of what goes into the release at this point and what is > the > > >> > > overall > > >> > > > >> plan? > > >> > > > >> > > > > >> > > > >> > > Thanks, > > >> > > > >> > > D. > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > >
