Alex, I spent some time debugging this today.

I noticed that we do not verify that topology version of the custom message
is identical to current ring version. After I added this condition test
started passing. However, it hangs from time to time since custom message
gets discarded before it gets processed (the new condition works here)
which means that topology version has somehow been changed, but custom
message has not been processed yet by that time.

My changes are in ignite-1171-debug. Can you please take a further look?

--Yakov

2015-09-22 5:50 GMT+03:00 Alexey Goncharuk <alexey.goncha...@gmail.com>:

> Folks,
>
> I was debugging issues with discovery today, my findings are below:
>
>    - Issue with assertion "topology version has not been updated" was
>    caused by sending discard message for custom messages. Now since we
>    re-arrange custom messages, discardId gets repositioned and messages
> that
>    should have been discarded were not discarded.
>    - Fixed the issue above by introducing separate pending queue for custom
>    messages which gets discarded independently from other discovery
> messages.
>    - Did not get to the bottom of "joining nodes" assertion. From the debug
>    I see that coordinator always fires custom messages at the right moment,
>    when joiningNodes is empty, however despite the fixed (above) issue with
>    custom messages discard, custom processed custom messages get re-sent
> which
>    leads to this assertion
>
> I committed my pending debug code to ignite-1171-debug branch, if any of
> you guys is up to debugging this issue while I'm asleep - great, if not -
> I'll continue digging into it tomorrow.
>
> 2015-09-21 10:55 GMT-07:00 Yakov Zhdanov <yzhda...@apache.org>:
>
> > Igniters,
> >
> > We are not ready to release today.
> >
> > Alexey Goncharuk is still working on ignite-1171. Alex please provide
> > updates by the end of the day.
> >
> > https://issues.apache.org/jira/browse/IGNITE-1516 - performance offheap
> > query benchmark is not fully recovered. Semyon will be fixing it. Sergi,
> > can you please assist?
> >
> > https://issues.apache.org/jira/browse/IGNITE-973 - Semyon has fixed race
> > in
> > cache logic, but issue is still reproducible due to possible issues in
> > indexing logic. Sergi, this is on you. Can you please take a look?
> >
> > --Yakov
> >
>

Reply via email to