Denis Chudov created IGNITE-14474:
-------------------------------------
Summary: Improve error message in case rebalance fails
Key: IGNITE-14474
URL: https://issues.apache.org/jira/browse/IGNITE-14474
Project: Ignite
Issue Type: Improvement
Reporter: Denis Chudov
Currently we can get a message like this when rebalance fails with an exception
(examples from ignite 2.5, in newer versions the log messages were changed but
the problem is still actual):
{code:java}
2019-11-27 13:41:14,504[WARN ][utility-#79%xxx%][GridDhtPartitionDemander]
Rebalancing from node cancelled [grp=ignite-sys-cache,
topVer=AffinityTopologyVersion [topVer=1932, minorTopVer=1],
supplier=f014f30a-77f2-4459-aa5b-6c12907a7449, topic=0]. Supply message
couldn't be unmarshalled: class o.a.i.IgniteCheckedException: Failed to
unmarshal object with optimized marshaller
2019-11-27 13:41:14,504[INFO ][utility-#79%xxx%][GridDhtPartitionDemander]
Cancelled rebalancing [grp=ignite-sys-cache,
supplier=f014f30a-77f2-4459-aa5b-6c12907a7449, topVer=AffinityTopologyVersion
[topVer=1932, minorTopVer=1], time=88 ms]
2019-11-27 13:41:14,508[WARN ][utility-#76%xxx%][GridDhtPartitionDemander]
Rebalancing from node cancelled [grp=ignite-sys-cache,
topVer=AffinityTopologyVersion [topVer=1932, minorTopVer=1],
supplier=dfa5ee06-48c9-4458-ae55-48cc6ceda998, topic=0]. Supply message
couldn't be unmarshalled: class o.a.i.IgniteCheckedException: Failed to
unmarshal object with optimized marshaller
{code}
In the case above, a marshalling exception leads to rebalance failure which
will never be resolved - i.e. the cluster enters into a erroneous state.
We should report issues like this as ERROR. The message should explain that the
rebalance has failed, data for the cache was not fully copied to the node, the
backup factor is not recovered and the cluster may not work correctly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)