Hello Sir,
I have installed GridWay-5.6.1 on a machine running globus-4.0.7 with PBS.
When I submit a job using gwsubmit command , the status of job remains
pending all the time. My grid has only single machine named "saum.grid". The
contents of various log file are given below:
Content of gwd.log:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sun Apr 3 18:45:04 2011 [IM][I]: Hosts discovered by MAD (mds4): saum.grid
Sun Apr 3 18:46:32 2011 [UM][I]: Executing command grid-proxy-info
-identity
Sun Apr 3 18:46:32 2011 [UM][I]: User proxy info,
/O=Grid/OU=GlobusTest/OU=simpleCA-saum.grid/OU=grid/CN=guser01
Sun Apr 3 18:46:32 2011 [UM][I]: Loading execution MADs for user guser01
(/O=Grid/OU=GlobusTest/OU=simpleCA-saum.grid/OU=grid/CN=guser01).
Sun Apr 3 18:46:33 2011 [TM][I]: -- MARK --
Sun Apr 3 18:46:34 2011 [UM][I]: Execution MAD ws loaded
(exec:gw_em_mad_ws, args:, mode:rsl2).
Sun Apr 3 18:46:34 2011 [UM][I]: Loading transfer MADs for user guser01
(/O=Grid/OU=GlobusTest/OU=simpleCA-saum.grid/OU=grid/CN=guser01).
Sun Apr 3 18:46:35 2011 [UM][I]: Transfer MAD gridftp loaded (exec:
gw_tm_mad_ftp, arg: ).
Sun Apr 3 18:46:35 2011 [UM][I]: User guser01
(/O=Grid/OU=GlobusTest/OU=simpleCA-saum.grid/OU=grid/CN=guser01) registered.
Sun Apr 3 18:46:35 2011 [DM][I]: New job 0 allocated and initialized.
Sun Apr 3 18:46:38 2011 [UM][I]: -- MARK --
Sun Apr 3 18:46:41 2011 [EM][I]: -- MARK --
Sun Apr 3 18:46:45 2011 [IM][I]: -- MARK --
Sun Apr 3 18:46:46 2011 [DM][I]: Dispatching job 0 to saum.grid (workq).
Sun Apr 3 18:47:05 2011 [IM][I]: Discovering hosts.
Sun Apr 3 18:47:05 2011 [DM][I]: -- MARK --
Sun Apr 3 18:47:39 2011 [IM][I]: Hosts discovered by MAD (mds4): saum.grid
Sun Apr 3 18:47:56 2011 [DM][I]: Rescheduling job 0.
Sun Apr 3 18:47:58 2011 [UM][I]: -- MARK --
Sun Apr 3 18:48:20 2011 [TM][I]: -- MARK --
Sun Apr 3 18:48:21 2011 [IM][I]: -- MARK --
Sun Apr 3 18:48:21 2011 [EM][I]: -- MARK --
Sun Apr 3 18:49:26 2011 [TM][I]: -- MARK --
Sun Apr 3 18:49:31 2011 [IM][I]: -- MARK --
Sun Apr 3 18:49:36 2011 [EM][I]: -- MARK --
Sun Apr 3 18:49:38 2011 [DM][I]: -- MARK --
Sun Apr 3 18:49:46 2011 [DM][I]: -- MARK --
Sun Apr 3 18:49:50 2011 [IM][I]: Discovering hosts.
Sun Apr 3 18:49:51 2011 [IM][I]: Discovering hosts.
Sun Apr 3 18:50:42 2011 [IM][I]: Hosts discovered by MAD (mds4): saum.grid
============================================================================================================
result of *gwps* command:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
USER JID DM EM START END EXEC XFER EXIT
NAME HOST
guser01:0 0 pend ---- 18:46:35 --:--:-- 0:00:52 0:00:28 --
jt saum.grid/PBS
================================================================================================
result of *gwhistory 0*:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HID START END PROLOG WRAPPER EPILOG MIGR REASON QUEUE
HOST
0 18:54:01 18:54:11 0:00:01 0:00:06 0:00:03 0:00:00 err workq
saum.grid/PBS
0 18:46:46 18:47:56 0:00:04 0:00:46 0:00:20 0:00:00 err workq
saum.grid/PBS
================================================================================================
content of job.log file:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sun Apr 3 18:46:35 2011 [DM][I]: ----------- Job configuration file (jt)
values -----------
Sun Apr 3 18:46:35 2011 [DM][I]: EXECUTABLE : /bin/hostname
Sun Apr 3 18:46:35 2011 [DM][I]: ARGUMENTS :
Sun Apr 3 18:46:35 2011 [DM][I]: INPUT_FILES (Total 0):
Sun Apr 3 18:46:35 2011 [DM][I]: OUTPUT_FILES (Total 0):
Sun Apr 3 18:46:35 2011 [DM][I]: RESTART_FILES (Total 0):
Sun Apr 3 18:46:35 2011 [DM][I]: STDIN_FILE : /dev/null
Sun Apr 3 18:46:35 2011 [DM][I]: STDOUT_FILE :
stdout.${JOB_ID}
Sun Apr 3 18:46:35 2011 [DM][I]: STDERR_FILE :
stderr.${JOB_ID}
Sun Apr 3 18:46:35 2011 [DM][I]: REQUIREMENTS :
Sun Apr 3 18:46:35 2011 [DM][I]: RANK :
Sun Apr 3 18:46:35 2011 [DM][I]: RESCHEDULING_INTERVAL : 0
Sun Apr 3 18:46:35 2011 [DM][I]: RESCHEDULING_THRESHOLD : 300
Sun Apr 3 18:46:35 2011 [DM][I]: SUSPENSION_TIMEOUT : 600
Sun Apr 3 18:46:35 2011 [DM][I]: CPULOAD_THRESHOLD : 50
Sun Apr 3 18:46:35 2011 [DM][I]: RESCHEDULE_ON_FAILURE : yes
Sun Apr 3 18:46:35 2011 [DM][I]: NUMBER_OF_RETRIES : 1
Sun Apr 3 18:46:35 2011 [DM][I]: CHECKPOINT_INTERVAL : 0
Sun Apr 3 18:46:35 2011 [DM][I]: CHECKPOINT_URL :
Sun Apr 3 18:46:35 2011 [DM][I]: WRAPPER :
/usr/local/gw-5.6.1/libexec/gw_wrapper.sh
Sun Apr 3 18:46:35 2011 [DM][I]: MONITOR :
Sun Apr 3 18:46:35 2011 [DM][I]: PRE_WRAPPER :
Sun Apr 3 18:46:35 2011 [DM][I]: PRE_WRAPPER_ARGUMENTS :
Sun Apr 3 18:46:35 2011 [DM][I]: TYPE : single
Sun Apr 3 18:46:35 2011 [DM][I]: NP : 1
Sun Apr 3 18:46:35 2011 [DM][I]: DEADLINE : 0:00:00 0
Sun Apr 3 18:46:35 2011 [DM][I]:
----------------------------------------------------------
Sun Apr 3 18:46:35 2011 [DM][I]: New state is PENDING.
Sun Apr 3 18:46:46 2011 [DM][I]: New state is PROLOG.
Sun Apr 3 18:46:46 2011 [TM][I]: Creating remote job working directory:
Sun Apr 3 18:46:46 2011 [TM][I]: Target url:
gsiftp://saum.grid/~/.gw_guser01_0/.
Sun Apr 3 18:46:48 2011 [TM][I]: Remote job directory created.
Sun Apr 3 18:46:48 2011 [TM][I]: Staging input files:
Sun Apr 3 18:46:48 2011 [TM][I]: Source: /home/guser01/GridWay.
Sun Apr 3 18:46:48 2011 [TM][I]: Copying file
file:///usr/local/gw-5.6.1/var/0/job.env.
Sun Apr 3 18:46:48 2011 [TM][W]: Skipping file /bin/hostname, absolute
path.
Sun Apr 3 18:46:48 2011 [TM][W]: Skipping file /dev/null, absolute
path.
Sun Apr 3 18:46:48 2011 [TM][I]: Copying file
file:///usr/local/gw-5.6.1/libexec/gw_wrapper.sh.
Sun Apr 3 18:46:49 2011 [TM][I]: File
file:///usr/local/gw-5.6.1/var/0/job.env copied.
Sun Apr 3 18:46:50 2011 [TM][I]: File
file:///usr/local/gw-5.6.1/libexec/gw_wrapper.sh copied.
Sun Apr 3 18:46:50 2011 [TM][I]: All input files copied.
Sun Apr 3 18:46:50 2011 [DM][I]: Prolog done:
Sun Apr 3 18:46:50 2011 [DM][I]: Total time : 4
Sun Apr 3 18:46:50 2011 [DM][I]: New state is WRAPPER.
Sun Apr 3 18:46:50 2011 [EM][I]: Submitting wrapper to saum.grid/PBS, RSL
used is in /usr/local/gw-5.6.1/var/0/job.rsl.0.
Sun Apr 3 18:47:33 2011 [EM][I]: New execution state is PENDING.
Sun Apr 3 18:47:36 2011 [EM][I]: Execution state is PENDING.
Sun Apr 3 18:47:36 2011 [EM][I]: New execution state is ACTIVE.
Sun Apr 3 18:47:36 2011 [EM][I]: New execution state is DONE.
Sun Apr 3 18:47:36 2011 [DM][I]: Wrapper DONE:
Sun Apr 3 18:47:36 2011 [DM][I]: Active time : 0
Sun Apr 3 18:47:36 2011 [DM][I]: Suspension time : 46
Sun Apr 3 18:47:36 2011 [DM][I]: Total time : 46
Sun Apr 3 18:47:36 2011 [DM][I]: New state is EPILOG_STD.
Sun Apr 3 18:47:36 2011 [TM][I]: Staging output files:
Sun Apr 3 18:47:36 2011 [TM][I]: Source:
gsiftp://saum.grid/~/.gw_guser01_0/.
Sun Apr 3 18:47:36 2011 [TM][I]: Copying file stdout.wrapper.
Sun Apr 3 18:47:36 2011 [TM][I]: Copying file stderr.wrapper.
Sun Apr 3 18:47:49 2011 [TM][I]: File stdout.wrapper copied.
Sun Apr 3 18:47:51 2011 [TM][I]: File stderr.wrapper copied.
Sun Apr 3 18:47:51 2011 [TM][I]: All output files copied.
Sun Apr 3 18:47:51 2011 [DM][E]: Unable to find exit code, assuming that
the job failed or was cancelled.
Sun Apr 3 18:47:51 2011 [DM][I]: New state is EPILOG_RESTART.
Sun Apr 3 18:47:51 2011 [TM][I]: Staging output files:
Sun Apr 3 18:47:51 2011 [TM][I]: Source:
gsiftp://saum.grid/~/.gw_guser01_0/.
Sun Apr 3 18:47:51 2011 [TM][I]: Copying file stdout.execution.
Sun Apr 3 18:47:51 2011 [TM][I]: Copying file stderr.execution.
Sun Apr 3 18:47:53 2011 [TM][E]: Copy of file stdout.execution failed.
Sun Apr 3 18:47:54 2011 [TM][E]: Copy of file stderr.execution failed.
Sun Apr 3 18:47:54 2011 [TM][W]: Some output files were not copied.
Sun Apr 3 18:47:54 2011 [TM][W]: Removing remote directory:
Sun Apr 3 18:47:54 2011 [TM][W]: Target url:
gsiftp://saum.grid/~/.gw_guser01_0/.
Sun Apr 3 18:47:56 2011 [TM][I]: Remote job directory removed.
Sun Apr 3 18:47:56 2011 [DM][E]: Epilog failed:
Sun Apr 3 18:47:56 2011 [DM][E]: Total time : 20
Sun Apr 3 18:47:56 2011 [DM][I]: Rescheduling job.
Sun Apr 3 18:47:56 2011 [DM][I]: New state is PENDING.
Sun Apr 3 18:54:01 2011 [DM][I]: New state is PROLOG.
Sun Apr 3 18:54:01 2011 [TM][I]: Creating remote job working directory:
Sun Apr 3 18:54:01 2011 [TM][I]: Target url:
gsiftp://saum.grid/~/.gw_guser01_0/.
Sun Apr 3 18:54:01 2011 [TM][I]: Remote job directory created.
Sun Apr 3 18:54:01 2011 [TM][I]: Staging input files:
Sun Apr 3 18:54:01 2011 [TM][I]: Source: /home/guser01/GridWay.
Sun Apr 3 18:54:01 2011 [TM][I]: Copying file
file:///usr/local/gw-5.6.1/var/0/job.env.
Sun Apr 3 18:54:01 2011 [TM][W]: Skipping file /bin/hostname, absolute
path.
Sun Apr 3 18:54:01 2011 [TM][W]: Skipping file /dev/null, absolute
path.
Sun Apr 3 18:54:01 2011 [TM][I]: Copying file
file:///usr/local/gw-5.6.1/libexec/gw_wrapper.sh.
Sun Apr 3 18:54:02 2011 [TM][I]: File
file:///usr/local/gw-5.6.1/var/0/job.env copied.
Sun Apr 3 18:54:02 2011 [TM][I]: File
file:///usr/local/gw-5.6.1/libexec/gw_wrapper.sh copied.
Sun Apr 3 18:54:02 2011 [TM][I]: All input files copied.
Sun Apr 3 18:54:02 2011 [DM][I]: Prolog done:
Sun Apr 3 18:54:02 2011 [DM][I]: Total time : 1
Sun Apr 3 18:54:02 2011 [DM][I]: New state is WRAPPER.
Sun Apr 3 18:54:02 2011 [EM][I]: Submitting wrapper to saum.grid/PBS, RSL
used is in /usr/local/gw-5.6.1/var/0/job.rsl.1.
Sun Apr 3 18:54:07 2011 [EM][I]: New execution state is PENDING.
Sun Apr 3 18:54:08 2011 [EM][I]: Execution state is PENDING.
Sun Apr 3 18:54:08 2011 [EM][I]: New execution state is ACTIVE.
Sun Apr 3 18:54:08 2011 [EM][I]: New execution state is DONE.
Sun Apr 3 18:54:08 2011 [DM][I]: Wrapper DONE:
Sun Apr 3 18:54:08 2011 [DM][I]: Active time : 0
Sun Apr 3 18:54:08 2011 [DM][I]: Suspension time : 6
Sun Apr 3 18:54:08 2011 [DM][I]: Total time : 6
Sun Apr 3 18:54:08 2011 [DM][I]: New state is EPILOG_STD.
Sun Apr 3 18:54:08 2011 [TM][I]: Staging output files:
Sun Apr 3 18:54:08 2011 [TM][I]: Source:
gsiftp://saum.grid/~/.gw_guser01_0/.
Sun Apr 3 18:54:08 2011 [TM][I]: Copying file stdout.wrapper.
Sun Apr 3 18:54:08 2011 [TM][I]: Copying file stderr.wrapper.
Sun Apr 3 18:54:09 2011 [TM][I]: File stdout.wrapper copied.
Sun Apr 3 18:54:10 2011 [TM][I]: File stderr.wrapper copied.
Sun Apr 3 18:54:10 2011 [TM][I]: All output files copied.
Sun Apr 3 18:54:10 2011 [DM][E]: Unable to find exit code, assuming that
the job failed or was cancelled.
Sun Apr 3 18:54:10 2011 [DM][I]: New state is EPILOG_RESTART.
Sun Apr 3 18:54:10 2011 [TM][I]: Staging output files:
Sun Apr 3 18:54:10 2011 [TM][I]: Source:
gsiftp://saum.grid/~/.gw_guser01_0/.
Sun Apr 3 18:54:10 2011 [TM][I]: Copying file stdout.execution.
Sun Apr 3 18:54:10 2011 [TM][I]: Copying file stderr.execution.
Sun Apr 3 18:54:10 2011 [TM][E]: Copy of file stdout.execution failed.
Sun Apr 3 18:54:11 2011 [TM][E]: Copy of file stderr.execution failed.
Sun Apr 3 18:54:11 2011 [TM][W]: Some output files were not copied.
Sun Apr 3 18:54:11 2011 [TM][W]: Removing remote directory:
Sun Apr 3 18:54:11 2011 [TM][W]: Target url:
gsiftp://saum.grid/~/.gw_guser01_0/.
Sun Apr 3 18:54:11 2011 [TM][I]: Remote job directory removed.
Sun Apr 3 18:54:11 2011 [DM][E]: Epilog failed:
Sun Apr 3 18:54:11 2011 [DM][E]: Total time : 3
Sun Apr 3 18:54:11 2011 [DM][I]: Rescheduling job.
Sun Apr 3 18:54:11 2011 [DM][I]: New state is PENDING.
=========================================================================================================
And the globus container log corresponding to the gridway's* gwsubmit* is:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2011-04-03 18:35:17,183 INFO exec.StateMachine
[RunQueueThread_1,logJobAccepted:3424] Job
05217630-5df3-11e0-aafa-e2327fe73aea accepted for local user 'guser01'
2011-04-03 18:35:18,497 INFO exec.StateMachine
[RunQueueThread_2,logJobSubmitted:3436] Job
05217630-5df3-11e0-aafa-e2327fe73aea submitted with local job ID
'7.saum.grid'
2011-04-03 18:35:23,168 INFO exec.StateMachine
[RunQueueThread_11,logJobSucceeded:3446] Job
05217630-5df3-11e0-aafa-e2327fe73aea finished successfully
2011-04-03 18:47:29,586 INFO exec.StateMachine
[RunQueueThread_13,logJobAccepted:3424] Job
b9e4ae10-5df4-11e0-aafa-e2327fe73aea accepted for local user 'guser01'
2011-04-03 18:47:31,373 INFO exec.StateMachine
[RunQueueThread_14,logJobSubmitted:3436] Job
b9e4ae10-5df4-11e0-aafa-e2327fe73aea submitted with local job ID
'8.saum.grid'
2011-04-03 18:47:32,411 INFO exec.StateMachine
[RunQueueThread_5,logJobSucceeded:3446] Job
b9e4ae10-5df4-11e0-aafa-e2327fe73aea finished successfully
=======================================================================================================================
Plz tell me what is wrong with the GridWay's *gwsubmit*. How should I solve
the issue???
_ _ _ _ _ _ _ _ _ _
Regads
Saumesh Kumar
IIT Roorkee