Hello,
I have been working with Apache Ignite for the past couple of months, mainly
2.6, and just recently upgraded our framework to use 2.7. However, I have
encountered some concerning issues on grid node termination, and was wondering
if anyone else had similar experiences or could reference available solutions.
While using 2.6, our grid always successfully terminated nodes, printing out
'Ignite ver. 2.6.0#(...) stopped OK'
However, since making the upgrade to 2.7.0 I have not had a successful node
termination when the grid is under load (namely during grid rolling restart).
At first I thought 2.7.0 might have improved error reporting, and these were
errors present before that were not being caught. After looking into it more,
it just seems as if each component appears to be throwing its own variant of
'Ignite___Exception: Node is stopping'. My confidence dropped even further when
I noticed one of the 'errors':
"
[ERROR] [Thread-27] IgniteKernal - Failed to stop component (ignoring):
GridProcessorAdapter []
java.lang.UnsupportedOperationException: null
at
org.jsr166.ConcurrentLinkedHashMap.clear(ConcurrentLinkedHashMap.java:1551)
~[bdp-ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.job.GridJobProcessor.stop(GridJobProcessor.java:264)
~[bdp-ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2356)
[bdp-ignite-core-2.7.0.jar:2.7.0]
at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2228)
[bdp-ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2612)
[bdp-ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2575)
[bdp-ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$6.run(IgnitionEx.java:2100)
[bdp-ignite-core-2.7.0.jar:2.7.0]
"
The relevant lines of code are:
GridJobProcessor.stop():
https://github.com/apache/ignite/blob/2.7.0/modules/core/src/main/java/org/apache/ignite/internal/processors/job/GridJobProcessor.java#L264
ConcurrentLinkedHashMap.clear():
https://github.com/apache/ignite/blob/2.7.0/modules/core/src/main/java/org/jsr166/ConcurrentLinkedHashMap.java#L1550
It seems ConcurrentLinkedHashMap was recently changed to make clear an
unsupported operation, but that creates what would be arguably false errors on
node termination. Has anyone else encountered issues on node termination in
2.7, or perhaps the mistake is on my end and I am missing something critical.
Any insight on this matter would be greatly appreciated!
Sincerely,
Gabriel