Alexey,

I have been running Vert.x cluster manager tests today. IGNITE-1171 doesn't
reproduces anymore.

But new problem was found: https://issues.apache.org/jira/browse/IGNITE-1534

I'll try to create Ignite test for this problem. I hope you have some ideas
about how to reproduce (stable) and fix it.

On Wed, Sep 23, 2015 at 8:37 AM, Alexey Goncharuk <
[email protected]> wrote:

> Yakov,
>
> I think I fixed the remaining issues in the branch. There was one issue
> with the pending queue - my original ordering for messages was not correct.
> The other thing was the NodeAddFinished message processing that I consulter
> you with over Skype. The TC looks green(ish), I cleaned up the code and
> merged it to ignite-1171 (non-debug) branch, triggered TC one more time.
>
> It would be great if you guys trigger TC couple more times and monitor it's
> state because we changed I guess the most sensitive part of Ignite, but it
> feels like we're pretty close to get this issue fixed :)
>
> 2015-09-22 9:43 GMT-07:00 Yakov Zhdanov <[email protected]>:
>
> > Alex, I spent some time debugging this today.
> >
> > I noticed that we do not verify that topology version of the custom
> message
> > is identical to current ring version. After I added this condition test
> > started passing. However, it hangs from time to time since custom message
> > gets discarded before it gets processed (the new condition works here)
> > which means that topology version has somehow been changed, but custom
> > message has not been processed yet by that time.
> >
> > My changes are in ignite-1171-debug. Can you please take a further look?
> >
> > --Yakov
> >
> > 2015-09-22 5:50 GMT+03:00 Alexey Goncharuk <[email protected]>:
> >
> > > Folks,
> > >
> > > I was debugging issues with discovery today, my findings are below:
> > >
> > >    - Issue with assertion "topology version has not been updated" was
> > >    caused by sending discard message for custom messages. Now since we
> > >    re-arrange custom messages, discardId gets repositioned and messages
> > > that
> > >    should have been discarded were not discarded.
> > >    - Fixed the issue above by introducing separate pending queue for
> > custom
> > >    messages which gets discarded independently from other discovery
> > > messages.
> > >    - Did not get to the bottom of "joining nodes" assertion. From the
> > debug
> > >    I see that coordinator always fires custom messages at the right
> > moment,
> > >    when joiningNodes is empty, however despite the fixed (above) issue
> > with
> > >    custom messages discard, custom processed custom messages get
> re-sent
> > > which
> > >    leads to this assertion
> > >
> > > I committed my pending debug code to ignite-1171-debug branch, if any
> of
> > > you guys is up to debugging this issue while I'm asleep - great, if
> not -
> > > I'll continue digging into it tomorrow.
> > >
> > > 2015-09-21 10:55 GMT-07:00 Yakov Zhdanov <[email protected]>:
> > >
> > > > Igniters,
> > > >
> > > > We are not ready to release today.
> > > >
> > > > Alexey Goncharuk is still working on ignite-1171. Alex please provide
> > > > updates by the end of the day.
> > > >
> > > > https://issues.apache.org/jira/browse/IGNITE-1516 - performance
> > offheap
> > > > query benchmark is not fully recovered. Semyon will be fixing it.
> > Sergi,
> > > > can you please assist?
> > > >
> > > > https://issues.apache.org/jira/browse/IGNITE-973 - Semyon has fixed
> > race
> > > > in
> > > > cache logic, but issue is still reproducible due to possible issues
> in
> > > > indexing logic. Sergi, this is on you. Can you please take a look?
> > > >
> > > > --Yakov
> > > >
> > >
> >
>



-- 
Andrey Gura
GridGain Systems, Inc.
www.gridgain.com

Reply via email to