Yakov, The approach with collecting discovery data on NodeAddFinished message does not work because this messages get relayed to clients before the message passes the whole ring. If we make it to pass the ring and relay it to clients on the second round, we get the same race as I was fixing.
I think the correct approach here is to delay custom event messages when node join is in progress - basically do not allow custom messages between NodeAddedMessage and NodeAddFinished message. I implemented a very simple fix in ignite-1171, however I need you someone else with good expertise in discovery protocol to take a look at my changes because I am sure I missed something - e.g. I am not sure how delayed messages should be handled in case when coordinator node crashes. 2015-09-17 8:52 GMT-07:00 Yakov Zhdanov <[email protected]>: > Alex, I think it makes sense to continue investigating this. We can discuss > whether we include or skip the fix once fix is ready. > > As far as other tickets: > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%20ignite-1.4%20AND%20status%20!%3D%20closed%20ORDER%20BY%20assignee%20ASC%2C%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC > > IGNITE-1171 Getting affinity for topology version earlier than affinity is > calculated - is on Alex Goncharuk. > IGNITE-973 Failed to get value for key: 13791. at > > o.a.i.i.processors.query.h2.opt.GridH2AbstractKeyValueRow.getValue(GridH2AbstractKeyValueRow.java:223) > - assigned to Sergi. There seems to be a problem in offheap indexing which > can be reproduced from time to time. This is an old issue and I think can > be postponed if does not fit. > > +1 IGFS issue > and rest ver.x issues > > I hope IGNITE-1171 will be fixed today so picture become much cleaner. > > -- > Yakov Zhdanov, Director R&D > *GridGain Systems* > www.gridgain.com > > 2015-09-17 0:59 GMT+03:00 Alexey Goncharuk <[email protected]>: > > > Yakov, Igniters, > > > > I have found at least one issue related to ignite-1171 hang, it is caused > > by a race between discovery custom message and collectDiscoveryData() > call > > (updated the ticket). I remember we wanted to call collectDiscoveryData() > > during the NodeAddFinishedMessage processing, however it was not > > implemented - do we think that this is a correct change and do we want it > > to be fixed in 1.4? Discovery changes are quite sensitive and I would > > prefer them to be tested thoroughly. > > > > 2015-09-16 9:09 GMT-07:00 Yakov Zhdanov <[email protected]>: > > > > > Guys, > > > > > > I want to update release status. > > > > > > Testing has revealed some cache issues which should be fixed with the > > > release. Moreover, it turned out that these issues block vert.x > release. > > > So, if we fix them we can consider including vert.x into 1.4 release. > > Which > > > is good I think. > > > > > > I think that Alex Goncharuk is the best person who can look into vert.x > > > issues. Alex, please first of all pay attention to IGNITE-1171 - > Getting > > > affinity for topology version earlier than affinity is calculated - > Test > > > reproducing the issue has been added to ignite1.4. Alex please let us > > know > > > if this can be fixed. > > > > > > These issues are on Semyon Boikov: > > > > > > IGNITE-973 Failed to get value for key: 13791. at > > > > > > > > > o.a.i.i.processors.query.h2.opt.GridH2AbstractKeyValueRow.getValue(GridH2AbstractKeyValueRow.java:223) > > > - We need more time to finish with this. Some race in swap is still > > there. > > > IGNITE-1452 OptimizedMarshaller.unmarshal hangs in > > > IgniteCacheQueryNodeRestartSelfTest2 - Need to check TC and merge. > > > > > > Rest of tickets are vert.x related. Here is the link - > > > > > > > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%20ignite-1.4%20AND%20status%20!%3D%20closed%20ORDER%20BY%20assignee%20ASC%2C%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC > > > > > > Andrey Gura, please provide as much information as you can for the rest > > of > > > vert.x tickets. > > > > > > Thanks! > > > > > > --Yakov > > > > > > 2015-09-15 19:12 GMT+03:00 Yakov Zhdanov <[email protected]>: > > > > > > > Raul, how is your status with the streamer? I think there is no > reason > > > for > > > > rush. We can put it to 1.5. Please let me know what you think. > > > > > > > > As far as release status here are the open tickets - > > > > > > > > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%20ignite-1.4%20AND%20status%20!%3D%20closed%20ORDER%20BY%20assignee%20ASC%2C%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-1239 - Alex Goncharuk, > > can > > > > you please let us know if this will be finished today? > > > > https://issues.apache.org/jira/browse/IGNITE-1490 - Ilya Suntsov > works > > > on > > > > reproducing this. I suspect we may have problems with near cache > > > evictions. > > > > Can Val or Alex proceed with this after Ilya finishes test run? Ilya, > > > > please respond in ticket upon your results. > > > > > > > > Thanks! > > > > > > > > --Yakov > > > > > > > > 2015-09-15 11:15 GMT+03:00 Raul Kripalani <[email protected]>: > > > > > > > >> Hi guys, > > > >> > > > >> The MQTT streamer I'm working on will be ready this week. Hopefully > as > > > >> soon > > > >> as today or tomorrow. > > > >> > > > >> It's not important for the 1.4 release, but it seems like it'll make > > the > > > >> timeline to get potentially merged. > > > >> > > > >> Regards, > > > >> Raúl. > > > >> On 15 Sep 2015 00:05, "Yakov Zhdanov" <[email protected]> wrote: > > > >> > > > >> > Guys, > > > >> > > > > >> > Current status is the following: > > > >> > > > > >> > 1. Sam needs to merge his fixes after TC is finished. > > > >> > 2. Some minor changes pending from Denis + release notes fix > pointed > > > by > > > >> > Dmitry. > > > >> > 3. Several suites are still red on TC > > > >> > > > > >> > I have moved plenty of tickets to ignite-1.5. Here is the link to > > > >> currently > > > >> > open tickets that I want everyone (esp. assignees) to look through > > and > > > >> tell > > > >> > me whether ticket can be moved or should be fixed - > > > >> > > > > >> > > > > >> > > > > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%20ignite-1.4%20AND%20status%20!%3D%20closed%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC > > > >> > > > > >> > Alex Goncharuk has 5 tickets. > > > >> > Semyon Boikov has 5 tickets. > > > >> > Valentin has 4 > > > >> > Sergi has 4 > > > >> > Vladimir has 3 > > > >> > Ivan V. has 3 > > > >> > > > > >> > Guys, please look your tickets through and let us know your > > decision. > > > >> > > > > >> > --Yakov > > > >> > > > > >> > 2015-09-14 21:04 GMT+03:00 Dmitriy Setrakyan < > [email protected] > > >: > > > >> > > > > >> > > Yakov, > > > >> > > > > > >> > > I know you were managing the 1.4 release. Can you please provide > > an > > > >> > update > > > >> > > of what goes into the release at this point and what is the > > overall > > > >> plan? > > > >> > > > > > >> > > Thanks, > > > >> > > D. > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > >
