Maxim,

I looked into the code and checked how do we setup uncaught exception
handlers for internal executors. Indeed it looks not good (OOME leads
to failure handling, other exceptions are silently ignored). Logging
sounds good.

Have doubts about failure handling (e.g. terminating a node). I
remember at least one related discussion [1]. And here are some
statements which causes my doubts:
1. Not only assertion errors are a programming errors evidence, e.g. a
simple NPE often signals the same.
2. Assertions might be disabled in production.
3. Sometimes it seems that assertions are often overused in the Ignite
code. Intermittent concurrency-related assertion errors are not so
uncommon. And it seems impossible to figure out how critical a
particular error is.

[1] 
http://apache-ignite-developers.2346864.n4.nabble.com/Exceptions-thrown-in-IndexingSpi-and-quot-fail-fast-quot-principle-td38904.html

2020-05-24 12:13 GMT+03:00, Maxim Muzafarov <mmu...@apache.org>:
> Ivan,
>
> Not sure that I've got your point. Let me give some example:
>
> IgniteEx ignite = startGrid();
> ignite.context().getSystemExecutorService().execute(() -> { assert false;
> });
> ignite.context().getSystemExecutorService().submit(() -> { assert false;
> });
>
> In both of these cases there are no exceptions logged and the node not
> stopped (I assume that failing assertion means a code contract is
> violated). Of course, I'll get an assertion error if fut.get() will be
> called but there a lot of cases where it's not. So, I think we must
> log errors here or fail the node.
>
> On Fri, 22 May 2020 at 10:51, Ivan Pavlukhin <vololo...@gmail.com> wrote:
>>
>> Hi Maxim,
>>
>> I recently had similar troubles related to ExecutorService and
>> exception handling. It was an unexpected behavior for me that
>> exceptions of any kind* thrown from Runnable in a following code:
>> executor.submit(() -> throw new RuntimeException())
>> were silently ignored.
>>
>> I seemed to me really unexpected for me at first. But later on I
>> figured out the cause. There is a subtle detail which I forgot.
>> ExecutorService.submit returns a Future and leaves a responsibility to
>> handle exceptions of any kind to a caller of Future.get. And it was
>> really subtle for me, because I did not use a returned Future at all.
>> Actually I supposed ExecutorService.execute when used
>> ExecutorService.submit.
>>
>> Is it related to the subject? Can ExecutorService.execute help here?
>>
>> * In my case there were exceptions caused by missing classes on a
>> classpath.
>>
>> 2020-05-21 22:22 GMT+03:00, Maxim Muzafarov <mmu...@apache.org>:
>> > Folks,
>> >
>> > I've created an issue.
>> > https://issues.apache.org/jira/browse/IGNITE-13055
>> >
>> > On Wed, 6 May 2020 at 10:28, Nikolay Izhikov <nizhi...@apache.org>
>> > wrote:
>> >>
>> >> Hello, Maxim.
>> >>
>> >> I can confirm this itching issue.
>> >> It also happens when some custom Security plugin throws an exception
>> >> while
>> >> processing a communication message.
>> >>
>> >> ```
>> >> UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId;
>> >>
>> >> try (OperationSecurityContext s =
>> >> ctx.security().withContext(newSecSubjId)) {
>> >>     lsnr.onMessage(nodeId, msg, plc);
>> >> }
>> >> finally {
>> >>     if (change)
>> >>         CUR_PLC.set(oldPlc);
>> >> }
>> >> ```
>> >>
>> >> If an exception thrown from `withContext` it is hidden from the user.
>> >>
>> >> > 4 мая 2020 г., в 19:15, Maxim Muzafarov <mmu...@apache.org>
>> >> > написал(а):
>> >> >
>> >> > Igniters,
>> >> >
>> >> >
>> >> > I've spent a couple of days in debug mode and found that some of the
>> >> > configured ExecutorServices of an Ignite instance completely hide
>> >> > assertion errors without any logging and node killing. This may
>> >> > violate some internal guarantees and hide serious errors.
>> >> >
>> >> > Let's consider, for instance, GridDhtPartitionsExchangeFuture [1].
>> >> > It
>> >> > has three places of submitting some Runnable on system executor
>> >> > service. If an assertion error (or even any uncached exception)
>> >> > occurs
>> >> > in the code block below it will be swallowed without logging,
>> >> > exchange
>> >> > future completion or node stoping.
>> >> >
>> >> > cctx.kernalContext().getSystemExecutorService().submit(new Runnable()
>> >> > {
>> >> >    @Override public void run() {
>> >> >        sendPartitions(newCrd);
>> >> >    }
>> >> > });
>> >> >
>> >> > I've checked that these executor services and most of them
>> >> > configured
>> >> > to catch only OutOfMemoryError.
>> >> >
>> >> > Should we consider catching AssertionErrors as well and treat them
>> >> > as
>> >> > CRITICAL_ERRORS for the Failure Handler?
>> >> > Should we log uncached errors on each of them?
>> >> >
>> >> >
>> >> > The most important list of executor services configured with OOM
>> >> > handler
>> >> > only:
>> >> > execSvc
>> >> > svcExecSvc
>> >> > sysExecSvc
>> >> > p2pExecSvc
>> >> > restExecSvc
>> >> > utilityCacheExecSvc
>> >> > affExecSvc
>> >> > qryExecSvc
>> >> >
>> >> > [1]
>> >> > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848
>> >>
>> >
>>
>>
>> --
>>
>> Best regards,
>> Ivan Pavlukhin
>


-- 

Best regards,
Ivan Pavlukhin

Reply via email to