https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=36702

--- Comment #8 from Kyle M Hall (khall) <[email protected]> ---
Created attachment 200997
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=200997&action=edit
Bug 36702: Add ability to retry failed background jobs with a max tries
parameter

Currently when a background job fails it just stops with a status of 'failed'
and there is no way to have it run again. This is a problem for jobs that fail
for transient reasons, like an Elasticsearch index update when the server is
briefly overwhelmed, or a plugin job that calls an external API that is
temporarily unavailable.

This patch adds the ability to retry a failed job up to a maximum number of
times. When a job throws an error the worker leaves the failed job alone so its
messages and report are kept as history, and enqueues a new job that retries
it.
The new job points back at the one it is retrying through previous_job_id, so
the
whole chain of attempts can be followed.

The maximum number of retries comes from, in order:
* an explicit max_retries passed to enqueue()
* the job type's default_max_retries()
* BackgroundJobsDefaultMaxRetries system preference ( default 3 ).
A value of 0 disables retries. Jobs that aren't safe to re-run with their
original arguments
( batch modifications, imports, e-holdings creation, SUSHI harvesting and
statistics pseudonymization ) override default_max_retries to 0 so they opt
out.

Retries don't all fire at once. The first retry runs immediately and each
following retry waits an extra BackgroundJobsRetryDelay seconds ( default 30 ),
tracked through the not_before column. The worker won't process a job before
its
not_before time, requeuing it in RabbitMQ mode or skipping it until the next
poll in database mode, so retries work the same whether or not RabbitMQ is
used.

Test Plan:
1) Apply all the patches
2) Run updatedatabase.pl
3) Restart all the things!
4) Note the new columns on the background_jobs table ( max_retries, retries,
   previous_job_id, not_before ) and the two new system preferences,
   BackgroundJobsDefaultMaxRetries and BackgroundJobsRetryDelay, under
   Administration > System preferences > Administration > Jobs!
5) Leave BackgroundJobsDefaultMaxRetries at 3 and BackgroundJobsRetryDelay at
30
6) Make sure JobsNotificationMethod is set to 'STOMP'
7) Restart background jobs workers
8) Enqueue a job that will fail. In the Koha shell run:
     perl -e 'use Koha::BackgroundJob::TestTransport;
Koha::BackgroundJob::TestTransport->new->enqueue( { transport_id => 999999 }
);'
9) Browse to Administration > Manage jobs and refresh as the job runs
10) Note the original job ends as 'failed' and a new job is enqueued that links
    back to it through 'Retry of'
11) Note the first retry runs immediately ( Retries 1 / 3 ), a second is held
    back about 30 seconds ( Retries 2 / 3 ) and a third about 60 seconds (
Retries 3 / 3 )
12) Note that once retries reaches BackgroundJobsDefaultMaxRetries ( 3 ) no
further retry is made
13) Set JobsNotificationMethod to 'polling'
14) Restart background jobs workers again
15) Repeat step 8 to enqueue another failing job
16) Note similar retry chain is built

-- 
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[email protected]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Reply via email to