[jira] [Updated] (UNOMI-861) Adapt migration job to use asynchronous mode on deletion to avoid timeout and connection lost

Jira Wed, 23 Oct 2024 09:18:40 -0700


     [ 
https://issues.apache.org/jira/browse/UNOMI-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jonathan Sinovassin-Naïk updated UNOMI-861:
-------------------------------------------
    Description: 
h2. Explanation of the issue:

When using unomi with elastic cloud.

And executing the migration from unomi-1.x to unomi-2.x the migration sometimes 
fails because of the _delete_by_query requests.

There is a timeout which closes the _delete_by_query requests when they are 
taking more than 5 minutes. 
According to the size of the index, the deletion can be a quite long.
In this case, the connection to elasticsearch will be closed.
{color:#00875A}Note that the _delete_by_query task is still running in 
background, only the connection between unomi and elasticsearch is 
closed.{color}

The migration scripts are based on the synchronous behaviour. So in the 
migration script, we wait the end of the _delete_by_query before going to the 
next step.

Here is a deletion which can cause the issue:
https://github.com/apache/unomi/blob/d4f4ccdeb03acfb0493228559dc4d203e1ef7319/tools/shell-commands/src/main/resources/META-INF/cxs/migration/migrate-2.0.0-15-eventsReindex.groovy#L28

h2. Solutions to fix:

Change the _delete_by_query request to use the parameter 
wait_for_completion=false, Elasticsearch will asynchronously execute the 
_delete_by_query operation and immediately return a response containing the 
task information, instead of waiting for the operation to complete.

With the task id which will be returned, we can call the _task endpoint like 
the following:
{code:java}
GET _tasks/<task_id> 
{code}
and wait until the status of the task is completed before going to the next 
step.
*Note: We should keep in mind to handle each possible status (success, failed, 
etc)*

This way the synchronous behaviour will be implemented directly in the scripts.

*Any other solutions are welcome*


  was:
h2. Explanation of the issue:

When using unomi with elastic cloud.

And executing the migration from unomi-1.x to unomi-2.x the migration sometimes 
fails because of the _reindex requests.

There is a timeout which closes the _reindex requests when they are taking more 
than 2 minutes. This timeout cannot be changed in elastic cloud.
According to the size of the index, the reindex can be a quite long.
In this case, the connection to elasticsearch will be closed.
{color:#00875A}Note that the reindex task is still running in background, only 
the connection between unomi and elasticsearch is closed.{color}

The migration scripts are based on the synchronous behaviour. So in the 
migration script, we wait the end of the _reindex before going to the next step.

Here is a reindexing which can cause the issue:
https://github.com/apache/unomi/blob/d4f4ccdeb03acfb0493228559dc4d203e1ef7319/tools/shell-commands/src/main/resources/META-INF/cxs/migration/migrate-2.0.0-15-eventsReindex.groovy#L37

h2. Solutions to fix:

Change the _reindex request to use the parameter wait_for_completion=false, 
Elasticsearch will asynchronously execute the reindex operation and immediately 
return a response containing the task information, instead of waiting for the 
operation to complete.

With the task id which will be returned, we can call the _task endpoint like 
the following:
{code:java}
GET _tasks/<task_id> 
{code}
and wait until the status of the task is completed before going to the next 
step.
*Note: We should keep in mind to handle each possible status (success, failed, 
etc)*

This way the synchronous behaviour will be implemented directly in the scripts.

*Any other solutions are welcome*



> Adapt migration job to use asynchronous mode on deletion to avoid timeout and 
> connection lost
> ---------------------------------------------------------------------------------------------
>
>                 Key: UNOMI-861
>                 URL: https://issues.apache.org/jira/browse/UNOMI-861
>             Project: Apache Unomi
>          Issue Type: Task
>            Reporter: Jonathan Sinovassin-Naïk
>            Assignee: Jonathan Sinovassin-Naïk
>            Priority: Major
>             Fix For: unomi-2.6.0
>
>
> h2. Explanation of the issue:
> When using unomi with elastic cloud.
> And executing the migration from unomi-1.x to unomi-2.x the migration 
> sometimes fails because of the _delete_by_query requests.
> There is a timeout which closes the _delete_by_query requests when they are 
> taking more than 5 minutes. 
> According to the size of the index, the deletion can be a quite long.
> In this case, the connection to elasticsearch will be closed.
> {color:#00875A}Note that the _delete_by_query task is still running in 
> background, only the connection between unomi and elasticsearch is 
> closed.{color}
> The migration scripts are based on the synchronous behaviour. So in the 
> migration script, we wait the end of the _delete_by_query before going to the 
> next step.
> Here is a deletion which can cause the issue:
> https://github.com/apache/unomi/blob/d4f4ccdeb03acfb0493228559dc4d203e1ef7319/tools/shell-commands/src/main/resources/META-INF/cxs/migration/migrate-2.0.0-15-eventsReindex.groovy#L28
> h2. Solutions to fix:
> Change the _delete_by_query request to use the parameter 
> wait_for_completion=false, Elasticsearch will asynchronously execute the 
> _delete_by_query operation and immediately return a response containing the 
> task information, instead of waiting for the operation to complete.
> With the task id which will be returned, we can call the _task endpoint like 
> the following:
> {code:java}
> GET _tasks/<task_id> 
> {code}
> and wait until the status of the task is completed before going to the next 
> step.
> *Note: We should keep in mind to handle each possible status (success, 
> failed, etc)*
> This way the synchronous behaviour will be implemented directly in the 
> scripts.
> *Any other solutions are welcome*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (UNOMI-861) Adapt migration job to use asynchronous mode on deletion to avoid timeout and connection lost

Reply via email to