Maxim, > Assertions must be disabled in production :-)
Since production deployments are out of our control we should think about ones with enabled assertions as well (and they do occur in practice). > A contributor who makes a new patch must be able to see failed tests > immediately, he must not check all the logs with logged assertions. The > assertion is an error (not to be confused with an exception), all the errors > must fail the node. I am not sure how practical it is today. I might have missed some recent activities but I recall an effort to enable a node terminating failure handler in tests. AFAIR, it was not successful and had lead to troubles with TC build results interpretation. 2020-05-25 14:27 GMT+03:00, Maxim Muzafarov <mmu...@apache.org>: > Ivan, > >> 2. Assertions might be disabled in production. > Assertions must be disabled in production :-) > > Here is another side of the same coin. > A contributor who makes a new patch must be able to see failed tests > immediately, he must not check all the logs with logged assertions. > The assertion is an error (not to be confused with an exception), all > the errors must fail the node. > > If we not enabling catching assertions for ExecutorServices this patch > for all the cases, at least it must be done for all tests running the > TeamCity environment. But I think we should enable fail node if an > error occurs (assume assertions disabled in production). > > On Mon, 25 May 2020 at 12:36, Ivan Pavlukhin <vololo...@gmail.com> wrote: >> >> Maxim, >> >> I looked into the code and checked how do we setup uncaught exception >> handlers for internal executors. Indeed it looks not good (OOME leads >> to failure handling, other exceptions are silently ignored). Logging >> sounds good. >> >> Have doubts about failure handling (e.g. terminating a node). I >> remember at least one related discussion [1]. And here are some >> statements which causes my doubts: >> 1. Not only assertion errors are a programming errors evidence, e.g. a >> simple NPE often signals the same. >> 2. Assertions might be disabled in production. >> 3. Sometimes it seems that assertions are often overused in the Ignite >> code. Intermittent concurrency-related assertion errors are not so >> uncommon. And it seems impossible to figure out how critical a >> particular error is. >> >> [1] >> http://apache-ignite-developers.2346864.n4.nabble.com/Exceptions-thrown-in-IndexingSpi-and-quot-fail-fast-quot-principle-td38904.html >> >> 2020-05-24 12:13 GMT+03:00, Maxim Muzafarov <mmu...@apache.org>: >> > Ivan, >> > >> > Not sure that I've got your point. Let me give some example: >> > >> > IgniteEx ignite = startGrid(); >> > ignite.context().getSystemExecutorService().execute(() -> { assert >> > false; >> > }); >> > ignite.context().getSystemExecutorService().submit(() -> { assert >> > false; >> > }); >> > >> > In both of these cases there are no exceptions logged and the node not >> > stopped (I assume that failing assertion means a code contract is >> > violated). Of course, I'll get an assertion error if fut.get() will be >> > called but there a lot of cases where it's not. So, I think we must >> > log errors here or fail the node. >> > >> > On Fri, 22 May 2020 at 10:51, Ivan Pavlukhin <vololo...@gmail.com> >> > wrote: >> >> >> >> Hi Maxim, >> >> >> >> I recently had similar troubles related to ExecutorService and >> >> exception handling. It was an unexpected behavior for me that >> >> exceptions of any kind* thrown from Runnable in a following code: >> >> executor.submit(() -> throw new RuntimeException()) >> >> were silently ignored. >> >> >> >> I seemed to me really unexpected for me at first. But later on I >> >> figured out the cause. There is a subtle detail which I forgot. >> >> ExecutorService.submit returns a Future and leaves a responsibility to >> >> handle exceptions of any kind to a caller of Future.get. And it was >> >> really subtle for me, because I did not use a returned Future at all. >> >> Actually I supposed ExecutorService.execute when used >> >> ExecutorService.submit. >> >> >> >> Is it related to the subject? Can ExecutorService.execute help here? >> >> >> >> * In my case there were exceptions caused by missing classes on a >> >> classpath. >> >> >> >> 2020-05-21 22:22 GMT+03:00, Maxim Muzafarov <mmu...@apache.org>: >> >> > Folks, >> >> > >> >> > I've created an issue. >> >> > https://issues.apache.org/jira/browse/IGNITE-13055 >> >> > >> >> > On Wed, 6 May 2020 at 10:28, Nikolay Izhikov <nizhi...@apache.org> >> >> > wrote: >> >> >> >> >> >> Hello, Maxim. >> >> >> >> >> >> I can confirm this itching issue. >> >> >> It also happens when some custom Security plugin throws an >> >> >> exception >> >> >> while >> >> >> processing a communication message. >> >> >> >> >> >> ``` >> >> >> UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId; >> >> >> >> >> >> try (OperationSecurityContext s = >> >> >> ctx.security().withContext(newSecSubjId)) { >> >> >> lsnr.onMessage(nodeId, msg, plc); >> >> >> } >> >> >> finally { >> >> >> if (change) >> >> >> CUR_PLC.set(oldPlc); >> >> >> } >> >> >> ``` >> >> >> >> >> >> If an exception thrown from `withContext` it is hidden from the >> >> >> user. >> >> >> >> >> >> > 4 мая 2020 г., в 19:15, Maxim Muzafarov <mmu...@apache.org> >> >> >> > написал(а): >> >> >> > >> >> >> > Igniters, >> >> >> > >> >> >> > >> >> >> > I've spent a couple of days in debug mode and found that some of >> >> >> > the >> >> >> > configured ExecutorServices of an Ignite instance completely hide >> >> >> > assertion errors without any logging and node killing. This may >> >> >> > violate some internal guarantees and hide serious errors. >> >> >> > >> >> >> > Let's consider, for instance, GridDhtPartitionsExchangeFuture >> >> >> > [1]. >> >> >> > It >> >> >> > has three places of submitting some Runnable on system executor >> >> >> > service. If an assertion error (or even any uncached exception) >> >> >> > occurs >> >> >> > in the code block below it will be swallowed without logging, >> >> >> > exchange >> >> >> > future completion or node stoping. >> >> >> > >> >> >> > cctx.kernalContext().getSystemExecutorService().submit(new >> >> >> > Runnable() >> >> >> > { >> >> >> > @Override public void run() { >> >> >> > sendPartitions(newCrd); >> >> >> > } >> >> >> > }); >> >> >> > >> >> >> > I've checked that these executor services and most of them >> >> >> > configured >> >> >> > to catch only OutOfMemoryError. >> >> >> > >> >> >> > Should we consider catching AssertionErrors as well and treat >> >> >> > them >> >> >> > as >> >> >> > CRITICAL_ERRORS for the Failure Handler? >> >> >> > Should we log uncached errors on each of them? >> >> >> > >> >> >> > >> >> >> > The most important list of executor services configured with OOM >> >> >> > handler >> >> >> > only: >> >> >> > execSvc >> >> >> > svcExecSvc >> >> >> > sysExecSvc >> >> >> > p2pExecSvc >> >> >> > restExecSvc >> >> >> > utilityCacheExecSvc >> >> >> > affExecSvc >> >> >> > qryExecSvc >> >> >> > >> >> >> > [1] >> >> >> > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848 >> >> >> >> >> > >> >> >> >> >> >> -- >> >> >> >> Best regards, >> >> Ivan Pavlukhin >> > >> >> >> -- >> >> Best regards, >> Ivan Pavlukhin > -- Best regards, Ivan Pavlukhin