Benjamin BONNET created CAMEL-17157:
---------------------------------------
Summary: AggregateProcessor, TimeoutMap Restoration and Cluster
Key: CAMEL-17157
URL: https://issues.apache.org/jira/browse/CAMEL-17157
Project: Camel
Issue Type: Bug
Components: camel-core
Affects Versions: 3.12.0, 3.11.3
Reporter: Benjamin BONNET
Hi,
Consider an aggregate having completion timeout and backed by a persistent
repository (e.g. JBCAggregationRepository). When route starts, there is an
invocation to restoreTimeoutMapFromAggregationRepositonry()
(AggregatorProcessor, line 877). That method consists in :
# getting all keys of pending aggregations (i.e. aggregation that were not yet
completed when route stopped)
# iterate on each key to get each row and put row timeout into timeoutmap.
That works fine when there is only one instance, but if you deploy on a
cluster, things may go wrong.
As a matter of fact, if one instance is warming-up while another is modifying
repository, warm-up may fail (NullPointerException) : that occurs when a row
has been deleted (because aggregation was completed by a running instance)
between 1. and 2.
One can imagine another less noisy failure : a row is created by a running
instance between 1. and 2. . Then warming-up does not complain, but the new row
will not be included in timeout map, which may be an issue if the instance that
inserted that row into the repo is stopped before completion (timeout will not
be detected).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)