Re: Communication exception handling

Semyon Boikov Sat, 28 Nov 2015 05:56:38 -0800

Fix looks good, but it still can be dangerous to merge last minute before
release.


On Sat, Nov 28, 2015 at 4:44 PM, Yakov Zhdanov <[email protected]> wrote:

> Cache processor has not received stop signal since stopping thread is
> trapped in job processor waiting for all jobs to finish.
>
> --Yakov
>
> 2015-11-28 15:57 GMT+03:00 Semyon Boikov <[email protected]>:
>
> > Yakov,
> >
> > When node is stopped all cache futures are completed with error, where
> did
> > you see hang?
> >
> >
> > On Sat, Nov 28, 2015 at 3:37 PM, Yakov Zhdanov <[email protected]>
> > wrote:
> >
> > > Guys,
> > >
> > > I see the following code
> > >
> > >
> >
> (org/apache/ignite/internal/processors/cache/distributed/dht/GridDhtTxPrepareFuture.java:1129):
> > >
> > >                     try {
> > >                         cctx.io().send(n, req, tx.ioPolicy());
> > >                     }
> > >                     catch (ClusterTopologyCheckedException e) {
> > >                         fut.onNodeLeft(e);
> > >                     }
> > >                     catch (IgniteCheckedException e) {
> > >                         if (!cctx.kernalContext().isStopping())
> > >                             fut.onResult(e);
> > >                     }
> > >
> > >
> > > Which means that in case if node has just started stop procedure, all
> > cache
> > > operations may potentially hang. If cache.put() is called from job and
> > node
> > > is stopping gracefully, stop process hangs with 100% probability.
> > >
> > > This issue does not threaten failure detection and nodes crash cases
> > since
> > > this is handled by separate logic.
> > >
> > > I fixed Communication SPI to use its internal stopping flag instead of
> > the
> > > system wide one and this seems to fix the issue with graceful stop.
> > >
> > > Semyon, can you please see if this may cause any other issue of the
> kind?
> > >
> > > My changes are here - https://github.com/apache/ignite/pull/278
> > >
> > > --Yakov
> > >
> >
>

Re: Communication exception handling

Reply via email to