Hi Igniters,

Currently Ignite treats the "not enough data region capacity" case as a
critical failure and does not allow configuring any of the default critical
failure handlers to ignore that error.

In our company we have different teams using Apache Ignite and none of them
wants to apply a default "stop server" or "restart server" handler when
encountering the problem. We rather want to report this problem to DevOps
and the end users.

We developed a custom failure handler to deal with the problem but the
solution is really clumsy. And the most important thing is we think
treating this problem as a critical failure is not what most users would
want.

What do you think about enhancing Ignite not to treat the "not enough data
region capacity" case as a critical failure?

We opened IGNITE-16272 <https://issues.apache.org/jira/browse/IGNITE-16272> for
this discussion with the description below:

The Problem
Ignite raises the IgniteOutOfMemoryException
<https://github.com/apache/ignite/blob/2.11.1/modules/core/src/main/java/org/apache/ignite/internal/mem/IgniteOutOfMemoryException.java>if
a data region size is exceeded when trying to add more data to a cache.
Ignite considers the IgniteOutOfMemoryException as a critical failure. This
causes shutting down the Ignite server with the default failure handler.

However, reaching the data region capacity does not seem to be such a
critical problem requiring the server shutdown or restart. For example, in
our application we just want to report this problem back to the users and
notify the DevOps without applying the critical failure handler. To achieve
that, we had to define a custom FailureHandler that detects and ignores the
IgniteOutOfMemoryException and all the caused by the
IgniteOutOfMemoryException, allowing the final exception to reach the
application. This solution is clumsy and unreliable since it uses the
internal IgniteOutOfMemoryException definition and relies on a complex
secondary exception structure trying to find the IgniteOutOfMemoryException
among the suppressed exception and causes.

Ignite out-of-the-box failure handlers have the ignoredFailure property
that allows filtering out some kinds of failures. However, the
IgniteOutOfMemoryException is not among the FailureType
<https://github.com/apache/ignite/blob/2.11.1/modules/core/src/main/java/org/apache/ignite/failure/FailureType.java>that
can be ignored.

The Proposal

   1. Does anyone really want to treat the "data region capacity exceeded"
   problem as a critical failure and stop or restart the server?
      - Consider never treating this condition as a critical failure. This
      change is not backward compatible.
      - Or add another item to the FailureType enumeration to optionally
      allow the users not to have that treated as a critical failure. This is
      backward-compatible.
   2. Make the IgniteOutOfMemoryException a public API (now it is in the
   internal package)
   3. Consider renaming IgniteOutOfMemoryException (for example, to
   something like NotEnoughStorageException) since the current name is similar
   to a really critical and usually unrecoverable Java's OutOfMemoryError
   although the IgniteOutOfMemoryException is not that critical.

--
Best regards,
Alexey

Reply via email to