Hi all, 

I work with Jelle and want to add on the issue. 

Help would be *greatly* appreciated as this is a *major* stopper on our 
production server right now.

In the database ‘workflow_invocation' table, one can see a ’state’ column with 
values like ’scheduled’ or ‘failed’. 

Before december 18, I only see the values ’scheduled’ or ‘failed’. 
After this date, a new state appeared : ’new’ . And this is always associated 
to handler 1 (would have 2 job handlers i.e. ‘0’ and ‘1’). 
As time goes on, we can see a mix of ’new’ and ‘scheduled’ state with more and 
more ‘new’ and from Jan 4 it is only ’new’ (only for handler ‘1')

This sounds like all workflows being assigned to handler1 never get into the 
‘scheduled’ mode and then jobs are never created.
I have 269 entries in the  ‘workflow_invocation’ table with ’new’ state and 
restarting the job handlers has no impact anymore (used to work a few days ago)

How can I fix this ?

Thank for your help

Charles

> On 5 Jan 2016, at 11:29, Jelle Scholtalbers <j.scholtalb...@gmail.com> wrote:
> 
> Hi all,
> 
> On our installation (v15.07) we suddenly see that one of two job handlers get 
> stuck with a high cpu load (last message generally, `cleaning up external 
> metadata files`) without new messages appearing. In addition, when running 
> workflows in batch (>6x), only a few of them (~3) get their workflow 
> steps/jobs scheduled (LSF-DRMAA).  For the remaining 3, their new histories 
> are created but remain empty (according to the GUI). Only upon restart of the 
> two job handlers the remaining workflow steps are scheduled and shown in the 
> history.
> 
> First question, how do we resolve this issue?
> Second, how does this actually work? How are the workflow steps stored in the 
> database i.e. why are they not shown in the web interface until they are 
> processed by a handler?
> 
> Possible relevant config settings:
> [server:handler0]
> use_threadpool = true
> threadpool_workers = 5
> 
> [server:handler1]
> use_threadpool = true
> threadpool_workers = 5
> 
> [app:main]
> force_beta_workflow_scheduled_min_steps=1
> force_beta_workflow_scheduled_for_collections=True
> track_jobs_in_database = True
> enable_job_recovery = True
> retry_metadata_internally = False
> cache_user_job_count = True # only a limit set for the very few local tools 
> like upload
> 
> Cheers,
> 
> Jelle
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  https://lists.galaxyproject.org/
> 
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to