Dear all,

Has anyone tried running Galaxy on Ubuntu 14.04?

I’m trying a test setup on two virtual machines (worker+master) with a SLURM 
queue. Getting in strange problems when jobs finish, the master hangs, 
completely unresponsive with CPU at 100% (as reported by virt-manager, not by 
top). Only drmaa jobs seem to be affected. After hanging, a reboot shows the 
job is finished (and green in history).

It took me some debugging to figure out where things go wrong, but it seems it 
goes wrong when os.remove is called in lib/galaxy/datatypes/ in 
method cleanup_external_metadata. I can reproduce the problem by calling 
os.remove(metadatafile) by hand (in an interactive python shell) when using pdb 
to create a breakpoint just before the call. If I comment out the os.remove it 
runs on until it hits another delete call in lib/galaxy/jobs/, base_dir='job_work', 
entire_dir=True, dir_only=True, extra_dir=str(self.job_id))
It’s in the JobWrapper class in the cleanup() method. I should mention here 
that my galaxy version is a bit old since I’m running my own fork with local 
modifications on datatypes.

This object_store.delete also leads to a shutil.rmtree and os.remove function. 
So, remove calls to the filesystem seem to hang the whole thing, but only at 
this point in time. Rebooting and removing by hand is no problem, pdb-stepping 
also sometimes fixes it (but if I just press continue it hangs). I don’t know 
where to go from here with debugging, but has anyone seen anything similar? 
Right now it feels like it may be caused by timing rather than actual code 

Jorrit Boekel
Proteomics systems developer
BILS / Lehtiö lab
Scilifelab Stockholm, Sweden

Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

To search Galaxy mailing lists use the unified search at:

Reply via email to