[
https://issues.apache.org/jira/browse/UNOMI-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jerome Blanchard updated UNOMI-908:
-----------------------------------
Description:
h3. Context
Currently, some Unomi entities (like rules, segment, propertyTypes...) are
polled in Elastic Search every seconds using Scheduled Job to ensure that if a
another unomi node has made some modifications on it, it is locally refreshed.
There is no need of explanation to understand that this approach, while
functioning, is not efficient at all in case of low frequency updates of the
entity.
More than that, without a strong scheduler engine capable of watchdog naive
scheduler implementation can have job (thread) that die. We already faced
production issues because of that, having nodes with different rule set.
A second point is with the removal of Karaf Cellar (Karaf cluster bundle and
config propagation feature) in Unomi 3, a simple cluster topology monitoring
has been introduced.
This implementation rely on a dedicated entity : ClusterNode stored in a
dedicated index of elasticsearch.
Every 10 seconds (using also a scheduled job), the ClusterNode document is
updated by setting its heartbeat field to the current timestamp. In the same
time, other ClusterNode are checked to see if the latest heartbeat is fresh
enough to keep it or not.
This topology management is very resource-intensive and does not follow state
of the art in terms of architecture.
Topology information is not something that need to be persisted expect for
audit (which is not the case here) and need to be managed in memory using
proven algorithm.
h3. Proposal
The goal here is to propose a solution that will serve both problems by relying
on external, proven and widely use solution : Distributed Caching with
Infinispan.
Because distributed caching libraries needs to rely on a cluster topology
manager inside, we could use the same tools for managing entities cache without
polling AND to discover and monitor the cluster topology.
We propose to use generic caching feature already available for Karaf :
infinispan. It will be package in a dedicated generic caching service based on
annotated methods and directly inspired from the current Unomi entities cache.
Thus, the underlying JGroup library used in Infinispan will also be exposed to
refactor the Unomi ClusterService instead of using persistence entity.
By externalizing caching into a dedicated, widely used and proven solution the
Unomi code will become lighter and more robust to manage cluster oriented
operations on entities.
The use of a Distributed Cache for persistent entities is something that is
widely used for decades and integrated in all Enterprise Level framework (EJB,
,Spring, ...) for a very long time. This is proven technology with very strong
implementation and support and Infinispan is one of the best reference in that
domain (used in Widlfy, Hibernate, Apache Camel, ...)
h3. Tasks
* Package a Unomi Cache feature that will rely on the existing Karaf
Infinispan Feature
* Refactor ClusterServiceImpl to takes advantage of infinispan cluster manger
or simply store ClusterNode in the DistributedCache instead of in ElasticSearch.
* Remove Elasticsearch-based persistence logic for ClusterNode.
* Ensure heartbeat updates are managed via distributed cache ; if not, rely on
the distributed cache underlying cluster management to manage ClusterNode
entities (JGroup for Infinispan)
* Remove Entity polling feature and use the distributed caching strategy for
the operations that loads entities in storage.
** Current listRules() is refactor to simply load entities in ES but with
distributed caching.
** updateRule() operation will also propagate the update in the distributed
cache avoiding any polling latency.
* Update documentation to reflect the new architecture.
h3. Definition of Done
* ClusterNode information is available and updated without Elasticsearch.
* No additional Elasticsearch index is created for cluster nodes.
* Heartbeat mechanism works reliably.
* All 'cacheable' entities rely on the dedicated cluster aware cache feature
based on infinispan karaf feature.
* All polling jobs are removed
* Test for entities update propagation over cluster is setup
* All relevant documentation is updated.
* Integration tests confirm correct cluster node management and heartbeat
updates.
* No regression in cluster management functionality.
was:
h3. Context
Currently, the ClusterService implementation in Unomi stores each ClusterNode
as a document in Elasticsearch. A scheduled job updates a heartbeat timestamp
every 10 seconds.
h3. Problem
This approach is resource-intensive and not well-suited for the use case of
relying on ElasticCloud, where operational costs are directly impacted by the
number of indexes. The ClusterNode implementation creates an additional index,
unnecessarily increasing costs.
h3. Proposal
Replace the current persistence-based storage of ClusterNode objects with an
InMemory cache (also supporting distributed cache for clustered instances).
Infinispan could be a good candidate as it is available as a Karaf feature and
can be easily integrated.
This change will:
* Reduce operational costs by removing the need for a dedicated Elasticsearch
index for cluster nodes.
* Improve performance and scalability for cluster node management.
* Align with other planned features, such as a distributed entity cache to
avoid intensive polling, which will also leverage the Infinispan Karaf feature.
h3. Tasks
* Refactor ClusterServiceImpl to use distributed cache for storing and updating
ClusterNode information.
* Remove Elasticsearch-based persistence logic for ClusterNode.
* Ensure heartbeat updates are managed via distributed cache ; if not, rely on
the distributed cache underlying cluster management to manage ClusterNode
entities (JGroup for Infinispan)
* Update documentation to reflect the new architecture.
* Validate compatibility with existing and planned features using Infinispan.
h3. Definition of Done
* ClusterNode information is available and updated without Elasticsearch.
* No additional Elasticsearch index is created for cluster nodes.
* Heartbeat mechanism works reliably.
* All relevant documentation is updated.
* Integration tests confirm correct cluster node management and heartbeat
updates.
* No regression in cluster management functionality.
> Introduce Distributed Cache to avoid intensive polling
> ------------------------------------------------------
>
> Key: UNOMI-908
> URL: https://issues.apache.org/jira/browse/UNOMI-908
> Project: Apache Unomi
> Issue Type: Improvement
> Components: unomi(-core)
> Affects Versions: unomi-3.0.0
> Reporter: Jerome Blanchard
> Priority: Major
>
> h3. Context
> Currently, some Unomi entities (like rules, segment, propertyTypes...) are
> polled in Elastic Search every seconds using Scheduled Job to ensure that if
> a another unomi node has made some modifications on it, it is locally
> refreshed.
> There is no need of explanation to understand that this approach, while
> functioning, is not efficient at all in case of low frequency updates of the
> entity.
> More than that, without a strong scheduler engine capable of watchdog naive
> scheduler implementation can have job (thread) that die. We already faced
> production issues because of that, having nodes with different rule set.
> A second point is with the removal of Karaf Cellar (Karaf cluster bundle and
> config propagation feature) in Unomi 3, a simple cluster topology monitoring
> has been introduced.
> This implementation rely on a dedicated entity : ClusterNode stored in a
> dedicated index of elasticsearch.
> Every 10 seconds (using also a scheduled job), the ClusterNode document is
> updated by setting its heartbeat field to the current timestamp. In the same
> time, other ClusterNode are checked to see if the latest heartbeat is fresh
> enough to keep it or not.
> This topology management is very resource-intensive and does not follow state
> of the art in terms of architecture.
> Topology information is not something that need to be persisted expect for
> audit (which is not the case here) and need to be managed in memory using
> proven algorithm.
> h3. Proposal
> The goal here is to propose a solution that will serve both problems by
> relying on external, proven and widely use solution : Distributed Caching
> with Infinispan.
> Because distributed caching libraries needs to rely on a cluster topology
> manager inside, we could use the same tools for managing entities cache
> without polling AND to discover and monitor the cluster topology.
> We propose to use generic caching feature already available for Karaf :
> infinispan. It will be package in a dedicated generic caching service based
> on annotated methods and directly inspired from the current Unomi entities
> cache.
> Thus, the underlying JGroup library used in Infinispan will also be exposed
> to refactor the Unomi ClusterService instead of using persistence entity.
> By externalizing caching into a dedicated, widely used and proven solution
> the Unomi code will become lighter and more robust to manage cluster oriented
> operations on entities.
> The use of a Distributed Cache for persistent entities is something that is
> widely used for decades and integrated in all Enterprise Level framework
> (EJB, ,Spring, ...) for a very long time. This is proven technology with very
> strong implementation and support and Infinispan is one of the best reference
> in that domain (used in Widlfy, Hibernate, Apache Camel, ...)
> h3. Tasks
> * Package a Unomi Cache feature that will rely on the existing Karaf
> Infinispan Feature
> * Refactor ClusterServiceImpl to takes advantage of infinispan cluster
> manger or simply store ClusterNode in the DistributedCache instead of in
> ElasticSearch.
> * Remove Elasticsearch-based persistence logic for ClusterNode.
> * Ensure heartbeat updates are managed via distributed cache ; if not, rely
> on the distributed cache underlying cluster management to manage ClusterNode
> entities (JGroup for Infinispan)
> * Remove Entity polling feature and use the distributed caching strategy for
> the operations that loads entities in storage.
> ** Current listRules() is refactor to simply load entities in ES but with
> distributed caching.
> ** updateRule() operation will also propagate the update in the distributed
> cache avoiding any polling latency.
> * Update documentation to reflect the new architecture.
> h3. Definition of Done
> * ClusterNode information is available and updated without Elasticsearch.
> * No additional Elasticsearch index is created for cluster nodes.
> * Heartbeat mechanism works reliably.
> * All 'cacheable' entities rely on the dedicated cluster aware cache feature
> based on infinispan karaf feature.
> * All polling jobs are removed
> * Test for entities update propagation over cluster is setup
> * All relevant documentation is updated.
> * Integration tests confirm correct cluster node management and heartbeat
> updates.
> * No regression in cluster management functionality.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)