Hi Kozin,

Are you using a python environment specifically for galaxy?  If not, then jobs 
running on the compute will be using the wrong python environment.  I setup 
galaxy (universe_wsgi.ini option) to source the python environment for galaxy 
before every job.

Galaxy is coded to work only if it is shared across the cluster under the same 
path for all the nodes.  Is this the case for the install sitting on Lustre?  
Hence, /home/galaxy/ is mounted on every compute node in the cluster from your 
LustreFS system?
I would be interested in the omitted output (assuming it is relevant).

Regards,

Iyad Kandalaft
Microbial Biodiversity Bioinformatics
Agriculture and Agri-Food Canada | Agriculture et Agroalimentaire Canada
960 Carling Ave.| 960 Ave. Carling
Ottawa, ON| Ottawa (ON) K1A 0C6
E-mail Address / Adresse courriel  iyad.kandal...@agr.gc.ca
Telephone | Téléphone 613-759-1228
Facsimile | Télécopieur 613-759-1701
Teletypewriter | Téléimprimeur 613-773-2600
Government of Canada | Gouvernement du Canada




From: I Kozin [mailto:igk...@gmail.com]
Sent: Wednesday, June 11, 2014 12:55 PM
To: Kandalaft, Iyad
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] troubleshooting Galaxy with LSF

Thank you, Iayd. Indeed, setting ulimit -s to unlimited helped to advance this 
further.
I can see now that a job gets generated and submitted. However Galaxy crashes 
immediately after that.
Job <108038> is submitted to queue <short>.
*** glibc detected *** python: free(): invalid pointer: 0x00007fff79f10b64 ***
======= Backtrace: =========
< further output is omitted >

Tracking the job through the scheduler reveals that the job finished 
successfully.

The command in the job script is something like this:

python /galaxy-dist/tools/data_source/upload.py /galaxy-dist 
/galaxy-dist/database/tmp/tmpGY5_lI /galaxy-dist/database/tmp/tmpr7VKGy         
1:/galaxy-dist/database/job_working_directory/000/1/dataset_1_files:/galaxy-dist/database/files/000/dataset_1.dat

usage: upload.py <root> <datatypes_conf> <json paramfile> <output spec> ...

I cannot re-run it because only the first file in the tmp folder is there. The 
second (json paramfile, tmpr7VKGy) is gone. I presume dataset_1.dat is the 
output and it's there.

The second half of the job script is the execution of set_metadata.sh
I can execute it without issues (is this a db update?).

One significant difference between the setup which works and which doesnt is 
that the working setup sits on local disk whereas the not working on Lustre. 
Could that be relevant?

By the way, is there a method for removing the pending job?
When I re-run Galaxy, it promptly crashes again due the stuck job.

When Galaxy starts, the only error that I see is this
IOError: [Errno 2] No such file or directory: './tools/mutation/visualize.xml'
While it might be a good question why mutation directory is not there, the 
error is very likely not relevant to the issue.

So I'm open to further suggestions as to how to understand what's going on.

Thank you

On 10 June 2014 19:24, Kandalaft, Iyad 
<iyad.kandal...@agr.gc.ca<mailto:iyad.kandal...@agr.gc.ca>> wrote:
This is just a guess, which may help you troubleshoot.
It could be a that python is reaching a stack limit: run ulimit -s  and set it 
to a higher value if required
I’m completely guessing here but is it possible that the DRMAA is missing a 
linked library on the redhat system – check with ldd?

Regards,
Iyad Kandalaft

Iyad Kandalaft
Microbial Biodiversity Bioinformatics
Agriculture and Agri-Food Canada | Agriculture et Agroalimentaire Canada
960 Carling Ave.| 960 Ave. Carling
Ottawa, ON| Ottawa (ON) K1A 0C6
E-mail Address / Adresse courriel  
iyad.kandal...@agr.gc.ca<mailto:iyad.kandal...@agr.gc.ca>
Telephone | Téléphone 613-759-1228<tel:613-759-1228>
Facsimile | Télécopieur 613-759-1701<tel:613-759-1701>
Teletypewriter | Téléimprimeur 613-773-2600<tel:613-773-2600>
Government of Canada | Gouvernement du Canada
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to