Thanks for the feedback. We found that set_meta_data_externally was commented
out so we put it in and set it to true. We are waiting to see what happens. If
that fails we'll start to run through the other options - many thanks for your
patience on this matter!
Dr David A. Matthews
Senior Lecturer in Virology
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University of Bristol
Tel. +44 117 3312058
Fax. +44 117 3312091
On 13 Mar 2012, at 17:59, Nate Coraor wrote:
> On Mar 13, 2012, at 6:59 AM, David Matthews wrote:
>> We emailed previously about possible memory leaks in our installation of
>> Galaxy here on the HPC at Bristol. We can run Galaxy just fine on our login
>> node but when we integrate into the cluster using pbs job runner the whole
>> thing falls over - almost certainly due to a memory leak. In essence, every
>> attempt to submit a TopHat job (with 2x5GB paired end reads to the full
>> human genome) always results in the whole thing falling over - but not when
>> Galaxy is restricted to the login node.
>> We saw that Nate responded to Todd Oakley about a week ago saying that there
>> is a memory leak in libtorque or pbs_python when using the pbs job runner.
>> Have there been any developments on this ?
>> Best Wishes,
> Hi David,
> I am almost certain that the problem you have with tophat is not due to the
> same leak, since it's a slow leak, not an immediate spike. Before we go any
> further, in reading back over our past conversation about this problem, I
> noticed that I never asked whether you've set `set_metadata_externally =
> True` in your Galaxy config. If not, this is almost certainly the cause of
> the problem.
> If you're already setting metadata externally, answers to a few of the
> questions I asked last time (or perhaps any findings of your HPC guys) and a
> few new things to try would be helpful in figuring out why your tophat jobs
> still crash:
> 1. Create a separate job runner and web frontend so we can be sure that the
> job running portion is the memory culprit:
> You would not need any of the load balancing config, just start a single web
> process and a single runner process. From reading your prior email I believe
> you have a proxy server, and so as long as you start the web process on the
> same port as your previous Galaxy server, no change would be needed to your
> proxy server.
> 2. Set use_heartbeat = True in the config file of whichever process is
> consuming all of the memory.
> 3. Does the MemoryError appear in the log after Galaxy has noticed that the
> job has finished on the cluster (`(<id>/<pbs id>) PBS job has left queue`),
> but before the job post-processing is finished (`job <id> ended`)?
> 4. Does the MemoryError appear regardless of whether anyone accesses the web
> There is another memory consumption problem we'll look at soon, which occurs
> when the job runner reads the metadata files written by the external
> set_metadata tool. If the output dataset(s) have an extremely large number
> of columns, this can cause a very large, nearly immediate memory spike when
> job post-processing begins, even if the output file itself is relatively
>> Dr David A. Matthews
>> Senior Lecturer in Virology
>> Room E49
>> Department of Cellular and Molecular Medicine,
>> School of Medical Sciences
>> University Walk,
>> University of Bristol
>> BS8 1TD
>> Tel. +44 117 3312058
>> Fax. +44 117 3312091
>> Please keep all replies on the list by using "reply all"
>> in your mail client. To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at: