[ 
https://issues.apache.org/jira/browse/UNOMI-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerome Blanchard updated UNOMI-908:
-----------------------------------
    Description: 
h3. Context

Currently, some Unomi entities (like rules, segment, propertyTypes...) are 
polled in Elastic Search every seconds using Scheduled Job to ensure that if a 
another unomi node has made some modifications on it, it is locally refreshed.
There is no need of explanation to understand that this approach, while 
functioning, is not efficient at all in case of low frequency updates of the 
entity. 
More than that, without a strong scheduler engine capable of watchdog naive 
scheduler implementation can have job (thread) that die. We already faced 
production issues because of that, having nodes with different rule set. 

A second point is with the removal of Karaf Cellar (Karaf cluster bundle and 
config propagation feature) in Unomi 3, a simple cluster topology monitoring 
has been introduced. 
This implementation rely on a dedicated entity : ClusterNode stored in a 
dedicated index of elasticsearch.  
Every 10 seconds (using also a scheduled job), the ClusterNode document is 
updated by setting its heartbeat field to the current timestamp. In the same 
time, other ClusterNode are checked to see if the latest heartbeat is fresh 
enough to keep it or not.
This topology management is very resource-intensive and does not follow state 
of the art in terms of architecture. 
Topology information is not something that need to be persisted expect for 
audit (which is not the case here) and need to be managed in memory using 
proven algorithm.
h3. Proposal

The goal here is to propose a solution that will serve both problems by relying 
on external, proven and widely use solution : Distributed Caching with 
Infinispan.

Because distributed caching libraries needs to rely on a cluster topology 
manager inside, we could use the same tools for managing entities cache without 
polling AND to discover and monitor the cluster topology.

We propose to use generic caching feature already available for Karaf : 
infinispan. It will be package in a dedicated generic caching service based on 
annotated methods and directly inspired from the current Unomi entities cache.

Thus, the underlying JGroup library used in Infinispan will also be exposed to 
refactor the Unomi ClusterService instead of using persistence entity.

By externalizing caching into a dedicated, widely used and proven solution the 
Unomi code will become lighter and more robust to manage cluster oriented 
operations on entities.

The use of a Distributed Cache for persistent entities is something that is 
widely used for decades and integrated in all Enterprise Level framework (EJB, 
,Spring, ...) for a very long time. This is proven technology with very strong 
implementation and support and Infinispan is one of the best reference in that 
domain (used in Widlfy, Hibernate,  Apache Camel, ...)
h3. Tasks
 * Package a Unomi Cache feature that will rely on the existing Karaf 
Infinispan Feature
 * Refactor ClusterServiceImpl to takes advantage of infinispan cluster manger 
or simply store ClusterNode in the DistributedCache instead of in ElasticSearch.
 * Remove Elasticsearch-based persistence logic for ClusterNode.
 * Ensure heartbeat updates are managed via distributed cache ; if not, rely on 
the distributed cache underlying cluster management to manage ClusterNode 
entities (JGroup for Infinispan)
 * Remove Entity polling feature and use the distributed caching strategy for 
the operations that loads entities in storage.
 ** Current listRules() is refactor to simply load entities in ES but with 
distributed caching.
 ** updateRule() operation will also propagate the update in the distributed 
cache avoiding any polling latency.
 * Update documentation to reflect the new architecture.

h3. Definition of Done
 * ClusterNode information is available and updated without Elasticsearch.
 * No additional Elasticsearch index is created for cluster nodes.
 * Heartbeat mechanism works reliably.
 * All 'cacheable' entities rely on the dedicated cluster aware cache feature 
based on infinispan karaf feature.
 * All polling jobs are removed
 * Test for entities update propagation over cluster is setup
 * All relevant documentation is updated.
 * Integration tests confirm correct cluster node management and heartbeat 
updates.
 * No regression in cluster management functionality.

  was:
h3. Context

Currently, the ClusterService implementation in Unomi stores each ClusterNode 
as a document in Elasticsearch. A scheduled job updates a heartbeat timestamp 
every 10 seconds.

h3. Problem

This approach is resource-intensive and not well-suited for the use case of 
relying on ElasticCloud, where operational costs are directly impacted by the 
number of indexes. The ClusterNode implementation creates an additional index, 
unnecessarily increasing costs.

h3. Proposal

Replace the current persistence-based storage of ClusterNode objects with an 
InMemory cache (also supporting distributed cache for clustered instances). 
Infinispan could be a good candidate as it is available as a Karaf feature and 
can be easily integrated. 

This change will:
* Reduce operational costs by removing the need for a dedicated Elasticsearch 
index for cluster nodes.
* Improve performance and scalability for cluster node management.
* Align with other planned features, such as a distributed entity cache to 
avoid intensive polling, which will also leverage the Infinispan Karaf feature.

h3. Tasks

* Refactor ClusterServiceImpl to use distributed cache for storing and updating 
ClusterNode information.
* Remove Elasticsearch-based persistence logic for ClusterNode.
* Ensure heartbeat updates are managed via distributed cache ; if not, rely on 
the distributed cache underlying cluster management to manage ClusterNode 
entities (JGroup for Infinispan)
* Update documentation to reflect the new architecture.
* Validate compatibility with existing and planned features using Infinispan.

h3. Definition of Done

* ClusterNode information is available and updated without Elasticsearch.
* No additional Elasticsearch index is created for cluster nodes.
* Heartbeat mechanism works reliably.
* All relevant documentation is updated.
* Integration tests confirm correct cluster node management and heartbeat 
updates.
* No regression in cluster management functionality.


> Introduce Distributed Cache to avoid intensive polling
> ------------------------------------------------------
>
>                 Key: UNOMI-908
>                 URL: https://issues.apache.org/jira/browse/UNOMI-908
>             Project: Apache Unomi
>          Issue Type: Improvement
>          Components: unomi(-core)
>    Affects Versions: unomi-3.0.0
>            Reporter: Jerome Blanchard
>            Priority: Major
>
> h3. Context
> Currently, some Unomi entities (like rules, segment, propertyTypes...) are 
> polled in Elastic Search every seconds using Scheduled Job to ensure that if 
> a another unomi node has made some modifications on it, it is locally 
> refreshed.
> There is no need of explanation to understand that this approach, while 
> functioning, is not efficient at all in case of low frequency updates of the 
> entity. 
> More than that, without a strong scheduler engine capable of watchdog naive 
> scheduler implementation can have job (thread) that die. We already faced 
> production issues because of that, having nodes with different rule set. 
> A second point is with the removal of Karaf Cellar (Karaf cluster bundle and 
> config propagation feature) in Unomi 3, a simple cluster topology monitoring 
> has been introduced. 
> This implementation rely on a dedicated entity : ClusterNode stored in a 
> dedicated index of elasticsearch.  
> Every 10 seconds (using also a scheduled job), the ClusterNode document is 
> updated by setting its heartbeat field to the current timestamp. In the same 
> time, other ClusterNode are checked to see if the latest heartbeat is fresh 
> enough to keep it or not.
> This topology management is very resource-intensive and does not follow state 
> of the art in terms of architecture. 
> Topology information is not something that need to be persisted expect for 
> audit (which is not the case here) and need to be managed in memory using 
> proven algorithm.
> h3. Proposal
> The goal here is to propose a solution that will serve both problems by 
> relying on external, proven and widely use solution : Distributed Caching 
> with Infinispan.
> Because distributed caching libraries needs to rely on a cluster topology 
> manager inside, we could use the same tools for managing entities cache 
> without polling AND to discover and monitor the cluster topology.
> We propose to use generic caching feature already available for Karaf : 
> infinispan. It will be package in a dedicated generic caching service based 
> on annotated methods and directly inspired from the current Unomi entities 
> cache.
> Thus, the underlying JGroup library used in Infinispan will also be exposed 
> to refactor the Unomi ClusterService instead of using persistence entity.
> By externalizing caching into a dedicated, widely used and proven solution 
> the Unomi code will become lighter and more robust to manage cluster oriented 
> operations on entities.
> The use of a Distributed Cache for persistent entities is something that is 
> widely used for decades and integrated in all Enterprise Level framework 
> (EJB, ,Spring, ...) for a very long time. This is proven technology with very 
> strong implementation and support and Infinispan is one of the best reference 
> in that domain (used in Widlfy, Hibernate,  Apache Camel, ...)
> h3. Tasks
>  * Package a Unomi Cache feature that will rely on the existing Karaf 
> Infinispan Feature
>  * Refactor ClusterServiceImpl to takes advantage of infinispan cluster 
> manger or simply store ClusterNode in the DistributedCache instead of in 
> ElasticSearch.
>  * Remove Elasticsearch-based persistence logic for ClusterNode.
>  * Ensure heartbeat updates are managed via distributed cache ; if not, rely 
> on the distributed cache underlying cluster management to manage ClusterNode 
> entities (JGroup for Infinispan)
>  * Remove Entity polling feature and use the distributed caching strategy for 
> the operations that loads entities in storage.
>  ** Current listRules() is refactor to simply load entities in ES but with 
> distributed caching.
>  ** updateRule() operation will also propagate the update in the distributed 
> cache avoiding any polling latency.
>  * Update documentation to reflect the new architecture.
> h3. Definition of Done
>  * ClusterNode information is available and updated without Elasticsearch.
>  * No additional Elasticsearch index is created for cluster nodes.
>  * Heartbeat mechanism works reliably.
>  * All 'cacheable' entities rely on the dedicated cluster aware cache feature 
> based on infinispan karaf feature.
>  * All polling jobs are removed
>  * Test for entities update propagation over cluster is setup
>  * All relevant documentation is updated.
>  * Integration tests confirm correct cluster node management and heartbeat 
> updates.
>  * No regression in cluster management functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to