I have followed the instructions on how to setup a local cluster closely 
(http://wiki.galaxyproject.org/Admin/Config/Performance/Cluster).  Frankly, the 
(Galaxy Configuration) section was not clear for me.  I am not sure if those 
outlined steps should be applied to the server's universe_wsgi.ini or to the 
nodes'?  So I might have overlooked some steps there but here is a summary of 
what I am doing at the server:


All my nodes have been configured so the galaxy user can ssh/scp between nodes 
without pwd.  Also the galaxy is a sudo in the galaxy (server).

Torque has been configured and tested between the nodes.  So the cluster is 
working fine.



In universal_wsgi.ini file at the server node

--------------------------

Start_job_runners=pbs,drama



Drama_external_runjob_~

Drama_external_killer~

External_chown_~



Pbs_application_server = galaxyhost (server)

Pbs_stage_path=/tmp/galaxy_stage/

Pbs_dataset_server= galaxyhost (server) ##This is the same like 
pbs_application_server



Also, ln -s /nfsexport/galaxy_stage /usr/local/galaxy/galaxy-dis/database/tmp




outputs_to_working_directory= False (if I changed this to True, the galaxy will 
not start)
---------------------------------------------

After restarting the galaxy at the server node.  The job seems to be submitted 
and its status is "R".  When I "top" the processes on the node where the job 
was sent to, I see two processes; ssh and scp ran by the galaxy server.  This 
tells me something is being copied over to the node.  But I am not sure what 
and to where?

After while the job status changed to "W".

qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
68.ngsgalaxy01             ...x...@idtdna.com galaxy                 0 W batch


Here is what I say from the log when the job is sent.
>>>>>>>>>>>>>>>>>>
galaxy.jobs DEBUG 2013-03-15 15:34:54,183 (341) Working directory for job is: 
/usr/local/galaxy/galaxy-dist/database/job_working_directory/000/341
galaxy.jobs.handler DEBUG 2013-03-15 15:34:54,183 dispatching job 341 to pbs 
runner
galaxy.jobs.handler INFO 2013-03-15 15:34:54,231 (341) Job dispatched
galaxy.tools DEBUG 2013-03-15 15:34:54,309 Building dependency shell command 
for dependency 'samtools'
galaxy.jobs.runners.pbs DEBUG 2013-03-15 15:34:54,391 (341) submitting file 
/usr/local/galaxy/galaxy-dist/database/pbs/341.sh
galaxy.jobs.runners.pbs DEBUG 2013-03-15 15:34:54,391 (341) command is: 
PACKAGE_BASE=/usr/local/galaxy/software/samtools/0.1.16; export PACKAGE_BASE; . 
/usr/local/galaxy/software/samtools/0.1.16/env.sh; samtools flagstat 
"/usr/local/galaxy/galaxy-dist/database/files/000/dataset_319.dat" > 
"/usr/local/galaxy/galaxy-dist/database/files/000/dataset_384.dat"
galaxy.jobs.runners.pbs DEBUG 2013-03-15 15:34:54,394 (341) queued in default 
queue as 70.ngsgalaxy01.idtdna.com
galaxy.jobs.runners.pbs DEBUG 2013-03-15 15:34:54,966 
(341/70.ngsgalaxy01.idtdna.com) PBS job state changed from N to R
>>>>>>>>>>>>>>>>>>

Here is the log when the ssh/scp on the node is finished.

>>>>>>>>>>>>>>>>>>>>
galaxy.jobs.runners.pbs DEBUG 2013-03-15 15:37:00,815 
(341/70.ngsgalaxy01.idtdna.com) PBS job state changed from R to W
>>>>>>>>>>>>>>>>>>>>

Here is the log when I qdel that job

>>>>>>>>>>>>>>>>>>>>
galaxy.jobs.runners.pbs WARNING 2013-03-15 15:39:20,016 Exit code  was invalid. 
Using 0.
galaxy.jobs DEBUG 2013-03-15 15:39:20,033 (341) Changing ownership of working 
directory with: /usr/bin/sudo -E scripts/external_chown_script.py 
/usr/local/galaxy/galaxy-dist/database/job_working_directory/000/341 galaxy 
10020
galaxy.jobs ERROR 2013-03-15 15:39:20,071 (341) Failed to change ownership of 
/usr/local/galaxy/galaxy-dist/database/job_working_directory/000/341, failing
Traceback (most recent call last):
  File "/usr/local/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py", line 336, 
in finish
    self.reclaim_ownership()
  File "/usr/local/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py", line 909, 
in reclaim_ownership
    self._change_ownership( self.galaxy_system_pwent[0], str( 
self.galaxy_system_pwent[3] ) )
  File "/usr/local/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py", line 895, 
in _change_ownership
    assert p.returncode == 0
AssertionError
galaxy.datatypes.metadata DEBUG 2013-03-15 15:39:20,160 Cleaning up external 
metadata files
galaxy.jobs.runners.pbs WARNING 2013-03-15 15:39:20,172 Unable to cleanup: 
[Errno 2] No such file or directory: 
'/usr/local/galaxy/galaxy-dist/database/pbs/341.o'
galaxy.jobs.runners.pbs WARNING 2013-03-15 15:39:20,173 Unable to cleanup: 
[Errno 2] No such file or directory: 
'/usr/local/galaxy/galaxy-dist/database/pbs/341.e'
galaxy.jobs.runners.pbs WARNING 2013-03-15 15:39:20,173 Unable to cleanup: 
[Errno 2] No such file or directory: 
'/usr/local/galaxy/galaxy-dist/database/pbs/341.ec'
10.7.10.201 - - [15/Mar/2013:15:39:22 -0500] "GET 
/api/histories/5a1cff6882ddb5b2 HTTP/1.0" 200 - "http://10.7.10.31/history"; 
"Mozilla/5.0 (Windows NT 5.2; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0"
10.7.10.201 - - [15/Mar/2013:15:39:22 -0500] "GET 
/api/histories/5a1cff6882ddb5b2/contents?ids=bbbfa414ae315caf HTTP/1.0" 200 - 
"http://10.7.10.31/history"; "Mozilla/5.0 (Windows NT 5.2; WOW64; rv:18.0) 
Gecko/20100101 Firefox/18.0"
 >>>>>>>>>>>>>>>>>>>>>>

Is there anything I am not doing or doing wrong?


Regards,





___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to