Maxim,

> Assertions must be disabled in production :-)

Since production deployments are out of our control we should think
about ones with enabled assertions as well (and they do occur in
practice).

> A contributor who makes a new patch must be able to see failed tests 
> immediately, he must not check all the logs with logged assertions. The 
> assertion is an error (not to be confused with an exception), all the errors 
> must fail the node.

I am not sure how practical it is today. I might have missed some
recent activities but I recall an effort to enable a node terminating
failure handler in tests. AFAIR, it was not successful and had lead to
troubles with TC build results interpretation.

2020-05-25 14:27 GMT+03:00, Maxim Muzafarov <mmu...@apache.org>:
> Ivan,
>
>> 2. Assertions might be disabled in production.
> Assertions must be disabled in production :-)
>
> Here is another side of the same coin.
> A contributor who makes a new patch must be able to see failed tests
> immediately, he must not check all the logs with logged assertions.
> The assertion is an error (not to be confused with an exception), all
> the errors must fail the node.
>
> If we not enabling catching assertions for ExecutorServices this patch
> for all the cases, at least it must be done for all tests running the
> TeamCity environment. But I think we should enable fail node if an
> error occurs (assume assertions disabled in production).
>
> On Mon, 25 May 2020 at 12:36, Ivan Pavlukhin <vololo...@gmail.com> wrote:
>>
>> Maxim,
>>
>> I looked into the code and checked how do we setup uncaught exception
>> handlers for internal executors. Indeed it looks not good (OOME leads
>> to failure handling, other exceptions are silently ignored). Logging
>> sounds good.
>>
>> Have doubts about failure handling (e.g. terminating a node). I
>> remember at least one related discussion [1]. And here are some
>> statements which causes my doubts:
>> 1. Not only assertion errors are a programming errors evidence, e.g. a
>> simple NPE often signals the same.
>> 2. Assertions might be disabled in production.
>> 3. Sometimes it seems that assertions are often overused in the Ignite
>> code. Intermittent concurrency-related assertion errors are not so
>> uncommon. And it seems impossible to figure out how critical a
>> particular error is.
>>
>> [1]
>> http://apache-ignite-developers.2346864.n4.nabble.com/Exceptions-thrown-in-IndexingSpi-and-quot-fail-fast-quot-principle-td38904.html
>>
>> 2020-05-24 12:13 GMT+03:00, Maxim Muzafarov <mmu...@apache.org>:
>> > Ivan,
>> >
>> > Not sure that I've got your point. Let me give some example:
>> >
>> > IgniteEx ignite = startGrid();
>> > ignite.context().getSystemExecutorService().execute(() -> { assert
>> > false;
>> > });
>> > ignite.context().getSystemExecutorService().submit(() -> { assert
>> > false;
>> > });
>> >
>> > In both of these cases there are no exceptions logged and the node not
>> > stopped (I assume that failing assertion means a code contract is
>> > violated). Of course, I'll get an assertion error if fut.get() will be
>> > called but there a lot of cases where it's not. So, I think we must
>> > log errors here or fail the node.
>> >
>> > On Fri, 22 May 2020 at 10:51, Ivan Pavlukhin <vololo...@gmail.com>
>> > wrote:
>> >>
>> >> Hi Maxim,
>> >>
>> >> I recently had similar troubles related to ExecutorService and
>> >> exception handling. It was an unexpected behavior for me that
>> >> exceptions of any kind* thrown from Runnable in a following code:
>> >> executor.submit(() -> throw new RuntimeException())
>> >> were silently ignored.
>> >>
>> >> I seemed to me really unexpected for me at first. But later on I
>> >> figured out the cause. There is a subtle detail which I forgot.
>> >> ExecutorService.submit returns a Future and leaves a responsibility to
>> >> handle exceptions of any kind to a caller of Future.get. And it was
>> >> really subtle for me, because I did not use a returned Future at all.
>> >> Actually I supposed ExecutorService.execute when used
>> >> ExecutorService.submit.
>> >>
>> >> Is it related to the subject? Can ExecutorService.execute help here?
>> >>
>> >> * In my case there were exceptions caused by missing classes on a
>> >> classpath.
>> >>
>> >> 2020-05-21 22:22 GMT+03:00, Maxim Muzafarov <mmu...@apache.org>:
>> >> > Folks,
>> >> >
>> >> > I've created an issue.
>> >> > https://issues.apache.org/jira/browse/IGNITE-13055
>> >> >
>> >> > On Wed, 6 May 2020 at 10:28, Nikolay Izhikov <nizhi...@apache.org>
>> >> > wrote:
>> >> >>
>> >> >> Hello, Maxim.
>> >> >>
>> >> >> I can confirm this itching issue.
>> >> >> It also happens when some custom Security plugin throws an
>> >> >> exception
>> >> >> while
>> >> >> processing a communication message.
>> >> >>
>> >> >> ```
>> >> >> UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId;
>> >> >>
>> >> >> try (OperationSecurityContext s =
>> >> >> ctx.security().withContext(newSecSubjId)) {
>> >> >>     lsnr.onMessage(nodeId, msg, plc);
>> >> >> }
>> >> >> finally {
>> >> >>     if (change)
>> >> >>         CUR_PLC.set(oldPlc);
>> >> >> }
>> >> >> ```
>> >> >>
>> >> >> If an exception thrown from `withContext` it is hidden from the
>> >> >> user.
>> >> >>
>> >> >> > 4 мая 2020 г., в 19:15, Maxim Muzafarov <mmu...@apache.org>
>> >> >> > написал(а):
>> >> >> >
>> >> >> > Igniters,
>> >> >> >
>> >> >> >
>> >> >> > I've spent a couple of days in debug mode and found that some of
>> >> >> > the
>> >> >> > configured ExecutorServices of an Ignite instance completely hide
>> >> >> > assertion errors without any logging and node killing. This may
>> >> >> > violate some internal guarantees and hide serious errors.
>> >> >> >
>> >> >> > Let's consider, for instance, GridDhtPartitionsExchangeFuture
>> >> >> > [1].
>> >> >> > It
>> >> >> > has three places of submitting some Runnable on system executor
>> >> >> > service. If an assertion error (or even any uncached exception)
>> >> >> > occurs
>> >> >> > in the code block below it will be swallowed without logging,
>> >> >> > exchange
>> >> >> > future completion or node stoping.
>> >> >> >
>> >> >> > cctx.kernalContext().getSystemExecutorService().submit(new
>> >> >> > Runnable()
>> >> >> > {
>> >> >> >    @Override public void run() {
>> >> >> >        sendPartitions(newCrd);
>> >> >> >    }
>> >> >> > });
>> >> >> >
>> >> >> > I've checked that these executor services and most of them
>> >> >> > configured
>> >> >> > to catch only OutOfMemoryError.
>> >> >> >
>> >> >> > Should we consider catching AssertionErrors as well and treat
>> >> >> > them
>> >> >> > as
>> >> >> > CRITICAL_ERRORS for the Failure Handler?
>> >> >> > Should we log uncached errors on each of them?
>> >> >> >
>> >> >> >
>> >> >> > The most important list of executor services configured with OOM
>> >> >> > handler
>> >> >> > only:
>> >> >> > execSvc
>> >> >> > svcExecSvc
>> >> >> > sysExecSvc
>> >> >> > p2pExecSvc
>> >> >> > restExecSvc
>> >> >> > utilityCacheExecSvc
>> >> >> > affExecSvc
>> >> >> > qryExecSvc
>> >> >> >
>> >> >> > [1]
>> >> >> > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848
>> >> >>
>> >> >
>> >>
>> >>
>> >> --
>> >>
>> >> Best regards,
>> >> Ivan Pavlukhin
>> >
>>
>>
>> --
>>
>> Best regards,
>> Ivan Pavlukhin
>


-- 

Best regards,
Ivan Pavlukhin

Reply via email to