Re: Communication exception handling

Yakov Zhdanov Sat, 28 Nov 2015 05:45:08 -0800

Cache processor has not received stop signal since stopping thread is
trapped in job processor waiting for all jobs to finish.


--Yakov

2015-11-28 15:57 GMT+03:00 Semyon Boikov <[email protected]>:

> Yakov,
>
> When node is stopped all cache futures are completed with error, where did
> you see hang?
>
>
> On Sat, Nov 28, 2015 at 3:37 PM, Yakov Zhdanov <[email protected]>
> wrote:
>
> > Guys,
> >
> > I see the following code
> >
> >
> (org/apache/ignite/internal/processors/cache/distributed/dht/GridDhtTxPrepareFuture.java:1129):
> >
> >                     try {
> >                         cctx.io().send(n, req, tx.ioPolicy());
> >                     }
> >                     catch (ClusterTopologyCheckedException e) {
> >                         fut.onNodeLeft(e);
> >                     }
> >                     catch (IgniteCheckedException e) {
> >                         if (!cctx.kernalContext().isStopping())
> >                             fut.onResult(e);
> >                     }
> >
> >
> > Which means that in case if node has just started stop procedure, all
> cache
> > operations may potentially hang. If cache.put() is called from job and
> node
> > is stopping gracefully, stop process hangs with 100% probability.
> >
> > This issue does not threaten failure detection and nodes crash cases
> since
> > this is handled by separate logic.
> >
> > I fixed Communication SPI to use its internal stopping flag instead of
> the
> > system wide one and this seems to fix the issue with graceful stop.
> >
> > Semyon, can you please see if this may cause any other issue of the kind?
> >
> > My changes are here - https://github.com/apache/ignite/pull/278
> >
> > --Yakov
> >
>

Re: Communication exception handling

Reply via email to