[jira] [Updated] (UNOMI-670) Unomi purge system is not working

Francois Gerthoffert (Jira) Wed, 12 Oct 2022 06:17:12 -0700


     [ 
https://issues.apache.org/jira/browse/UNOMI-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Francois Gerthoffert updated UNOMI-670:
---------------------------------------
    Description: 
The purge system have been fixed recently in version 1.6.0: 
[https://jira.jahia.org/browse/UNOMI-504] 

The fix that was done, just fixed the query used for identifying the profiles 
to be purged.

Actually we figured a bit late that the purge system contains more issues than 
that.

Here is a non exhaustive list of the one already identified and that will need 
fix/answers an even reflections/discussions for some of them
h2. Bug 1: disabling the profile purge will end up with all profiles to be 
purge everyday

Using this conf:
{code:java}
org.apache.unomi.profile.purge.existTime = -1
org.apache.unomi.profile.purge.inactiveTime = -1
org.apache.unomi.monthly.index.purge.existTime= 12 {code}
To only enable events/session purge will have a bad side effect: all profiles 
of the system will be removed everyday.
h2. bug 2: OOM issue for profile purge.

The persistence service is implementing a method: 
{code:java}
/**
 * Deletes items with the specified Item subclass matching the specified {@link 
Condition}.
 *
 * @param <T>   the type of the Item subclass we want to delete
 * @param query a {@link Condition} identifying which elements we want to delete
 * @param clazz the {@link Item} subclass of the items we want to delete
 * @return {@code true} if the deletion was successful, {@code false} otherwise
 */
<T extends Item> boolean removeByQuery(Condition query, Class<T> clazz); {code}
The issue with the implem is that it's doing the following:
 * query profiles to get the IDs of documents
 * building a batch request with a delete request per profile ID
 * sending the batch.

So in case of a lot profiles to be purge at once, the batch can grow a lot, and 
consume all the memory available.

We should not do like that, instead we should:
 * just use the ES query: 
[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html]
 directly with the query to identify the profiles

h2. Bug 3: What is the cache system implemented in the persistence service.

Seem's coming from: https://issues.apache.org/jira/browse/UNOMI-166 

It's used by the persistence service, and it explain why the removeByQuery 
implem is bad.

The reason why the *removeByQuery* is querying all the documents IDs is to 
invalidate the cache.

But it's strange, the cache system is not used at all:
{code:java}
<cm:property name="itemClassesToCache" value="" /> {code}
So I would say, we should remove this cache system, because:
 * it's not tested
 * nobody know what would be the effect if enabled
 * it's not possible to activate it with ENV conf, or global unomi conf.
 * it's not documented
 * it's impacting implems like: removeByQuery, even if the cache system is not 
used
 * it's mostly dead code inside the core persistence of Unomi, witch is 
dangerous to keep.

Update during team planning: This caching system should be removed.
h2. Bug 4: Purge job seem's never unregistered.

We do register the purge job when the profile service is started, but in case 
the profile service is stopped (stop the bundle for example). Then it's not 
unregistered.
h2. Bug 5: what should we do in case of unomi cluster, should all of them do 
the purge ?

In case of unomi cluster, all of them may try to trigger the purge, I dont 
think it's an issue, but it look like something could be done to avoid doing 
unecessary operations twice or more.
h2. Final: We should also add integrations tests to cover this feature.

  was:
The purge system have been fixed recently in version 1.6.0: 
[https://jira.jahia.org/browse/UNOMI-504] 

The fix that was done, just fixed the query used for identifying the profiles 
to be purged.

Actually we figured a bit late that the purge system contains more issues than 
that.

Here is a non exhaustive list of the one already identified and that will need 
fix/answers an even reflections/discussions for some of them
h2. Bug 1: disabling the profile purge will end up with all profiles to be 
purge everyday

Using this conf:
{code:java}
org.apache.unomi.profile.purge.existTime = -1
org.apache.unomi.profile.purge.inactiveTime = -1
org.apache.unomi.monthly.index.purge.existTime= 12 {code}
To only enable events/session purge will have a bad side effect: all profiles 
of the system will be removed everyday.
h2. bug 2: OOM issue for profile purge.

The persistence service is implementing a method: 
{code:java}
/**
 * Deletes items with the specified Item subclass matching the specified {@link 
Condition}.
 *
 * @param <T>   the type of the Item subclass we want to delete
 * @param query a {@link Condition} identifying which elements we want to delete
 * @param clazz the {@link Item} subclass of the items we want to delete
 * @return {@code true} if the deletion was successful, {@code false} otherwise
 */
<T extends Item> boolean removeByQuery(Condition query, Class<T> clazz); {code}
The issue with the implem is that it's doing the following:
 * query profiles to get the IDs of documents
 * building a batch request with a delete request per profile ID
 * sending the batch.

So in case of a lot profiles to be purge at once, the batch can grow a lot, and 
consume all the memory available.

We should not do like that, instead we should:
 * just use the ES query: 
[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html]
 directly with the query to identify the profiles

h2. Bug 3: What is the cache system implemented in the persistence service.

Seem's coming from: https://issues.apache.org/jira/browse/UNOMI-166 

It's used by the persistence service, and it explain why the removeByQuery 
implem is bad.

The reason why the *removeByQuery* is querying all the documents IDs is to 
invalidate the cache.

But it's strange, the cache system is not used at all:
{code:java}
<cm:property name="itemClassesToCache" value="" /> {code}
So I would say, we should remove this cache system, because:
 * it's not tested
 * nobody know what would be the effect if enabled
 * it's not possible to activate it with ENV conf, or global unomi conf.
 * it's not documented
 * it's impacting implems like: removeByQuery, even if the cache system is not 
used
 * it's mostly dead code inside the core persistence of Unomi, witch is 
dangerous to keep.

h2. Bug 4: Purge job seem's never unregistered.

We do register the purge job when the profile service is started, but in case 
the profile service is stopped (stop the bundle for example). Then it's not 
unregistered.
h2. Bug 5: what should we do in case of unomi cluster, should all of them do 
the purge ?

In case of unomi cluster, all of them may try to trigger the purge, I dont 
think it's an issue, but it look like something could be done to avoid doing 
unecessary operations twice or more.
h2. Final: We should also add integrations tests to cover this feature.


> Unomi purge system is not working
> ---------------------------------
>
>                 Key: UNOMI-670
>                 URL: https://issues.apache.org/jira/browse/UNOMI-670
>             Project: Apache Unomi
>          Issue Type: Bug
>          Components: unomi(-core)
>    Affects Versions: unomi-2.0.0, unomi-1.6.1
>            Reporter: Kevan Jahanshahi
>            Priority: Major
>             Fix For: unomi-2.0.1, unomi-1.6.2
>
>
> The purge system have been fixed recently in version 1.6.0: 
> [https://jira.jahia.org/browse/UNOMI-504] 
> The fix that was done, just fixed the query used for identifying the profiles 
> to be purged.
> Actually we figured a bit late that the purge system contains more issues 
> than that.
> Here is a non exhaustive list of the one already identified and that will 
> need fix/answers an even reflections/discussions for some of them
> h2. Bug 1: disabling the profile purge will end up with all profiles to be 
> purge everyday
> Using this conf:
> {code:java}
> org.apache.unomi.profile.purge.existTime = -1
> org.apache.unomi.profile.purge.inactiveTime = -1
> org.apache.unomi.monthly.index.purge.existTime= 12 {code}
> To only enable events/session purge will have a bad side effect: all profiles 
> of the system will be removed everyday.
> h2. bug 2: OOM issue for profile purge.
> The persistence service is implementing a method: 
> {code:java}
> /**
>  * Deletes items with the specified Item subclass matching the specified 
> {@link Condition}.
>  *
>  * @param <T>   the type of the Item subclass we want to delete
>  * @param query a {@link Condition} identifying which elements we want to 
> delete
>  * @param clazz the {@link Item} subclass of the items we want to delete
>  * @return {@code true} if the deletion was successful, {@code false} 
> otherwise
>  */
> <T extends Item> boolean removeByQuery(Condition query, Class<T> clazz); 
> {code}
> The issue with the implem is that it's doing the following:
>  * query profiles to get the IDs of documents
>  * building a batch request with a delete request per profile ID
>  * sending the batch.
> So in case of a lot profiles to be purge at once, the batch can grow a lot, 
> and consume all the memory available.
> We should not do like that, instead we should:
>  * just use the ES query: 
> [https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html]
>  directly with the query to identify the profiles
> h2. Bug 3: What is the cache system implemented in the persistence service.
> Seem's coming from: https://issues.apache.org/jira/browse/UNOMI-166 
> It's used by the persistence service, and it explain why the removeByQuery 
> implem is bad.
> The reason why the *removeByQuery* is querying all the documents IDs is to 
> invalidate the cache.
> But it's strange, the cache system is not used at all:
> {code:java}
> <cm:property name="itemClassesToCache" value="" /> {code}
> So I would say, we should remove this cache system, because:
>  * it's not tested
>  * nobody know what would be the effect if enabled
>  * it's not possible to activate it with ENV conf, or global unomi conf.
>  * it's not documented
>  * it's impacting implems like: removeByQuery, even if the cache system is 
> not used
>  * it's mostly dead code inside the core persistence of Unomi, witch is 
> dangerous to keep.
> Update during team planning: This caching system should be removed.
> h2. Bug 4: Purge job seem's never unregistered.
> We do register the purge job when the profile service is started, but in case 
> the profile service is stopped (stop the bundle for example). Then it's not 
> unregistered.
> h2. Bug 5: what should we do in case of unomi cluster, should all of them do 
> the purge ?
> In case of unomi cluster, all of them may try to trigger the purge, I dont 
> think it's an issue, but it look like something could be done to avoid doing 
> unecessary operations twice or more.
> h2. Final: We should also add integrations tests to cover this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (UNOMI-670) Unomi purge system is not working

Reply via email to