Hello Sir,
The ouput of the command "globusrun-ws -submit -F saum.grid -Ft PBS -s -c
/bin/uname -a" is :
------------------------------------------------------------------------------------------------------------------------------------------
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:aa4153a6-5f0b-11e0-98ac-08002793ca6e
Termination time: 04/05/2011 22:34 GMT
Current job state: Pending
Current job state: Active
Current job state: CleanUp-Hold
Host key verification failed.
[: 59: !=: unexpected operator
Current job state: CleanUp
Current job state: Done
Destroying job...Done.
Cleaning up any delegated credentials...Done.
=====================================================================
And globus-container.log entries:
-----------------------------------------------------------------------------------------------------------------------------------------
2011-04-05 04:04:15,496 INFO exec.StateMachine
[RunQueueThread_1,logJobAccepted:3424] Job
ab8da2a0-5f0b-11e0-a856-fdfbacd40977 accepted for local user 'guser01'
2011-04-05 04:04:18,051 INFO exec.StateMachine
[RunQueueThread_2,logJobSubmitted:3436] Job
ab8da2a0-5f0b-11e0-a856-fdfbacd40977 submitted with local job ID
'17.saum.grid'
2011-04-05 04:04:28,171 INFO exec.StateMachine
[RunQueueThread_16,logJobSucceeded:3446] Job
ab8da2a0-5f0b-11e0-a856-fdfbacd40977 finished successfully
=====================================================================
I think the problem is with PBS. So now I have removed PBS and there is only
Fork.
The problem with *gwsubmit* is that it is submitting the job to PBS. And I
think PBS is not working properly at my node. So now I am using GridWay on
node which has only Fork (no PBS), but job state is still Pending for all
the time.
I have installed GridWay - 5.6.1 on a NodeA (scheduler Fork) in our Grid
(with four nodes). Let the four nodes are: *NodeA* (nodea.grid),
*NodeB*(nodeb.grid),
*NodeC* (nodec.grid) and *NodeD* (noded.grid). I have *not installed PBS* on
any node. And I have organized these nodes to form a Virtual Organization
(VO) by modifying $GLOBUS_LOCATION/etc/globus_
wsrf_mds_index/hierarchy.xml file as given :
NodeA's Hierarchy file : no entry
NodeB's Hierarchy file:
<upstream>
https://nodea.grid:8443/wsrf/services/DefaultIndexService</upstream>
<downstream>
https://nodec.grid:8443/wsrf/services/DefaultIndexService</downstream>
<downstream>
https://noded.grid:8443/wsrf/services/DefaultIndexService</downstream>
After configuring the above mentioned setting. I tested GridWay's commands:
gwhost: (currently only two nodes are connected)
--------------------------------------------------------------------------------------------------------------------------------------------------------
0 1 NULLNULL NULL 0 0 0/0 0/0
0/0/2 Fork nodea.grid
1 1 NULLNULL NULL 0 0 0/0 0/0
0/0/1 Fork nodeb.grid
========================================================
gwps:
-------------------------------------------------------------------------------------------------------------------------------------------------------
guser01:0 0 pend ---- 15:16:49 --:--:-- 0:00:00 0:00:00 --
jt --
guser01:0 1 pend ---- 15:29:44 --:--:-- 0:00:00 0:00:00 --
jt --
=======================================================
And gwhistory not showing any output except heading of output like
HID START END PROLOG WRAPPER EPILOG MIGR REASON QUEUE
HOST
I had issued commands: *gwhistory 0* and *gwhistory 1*; but both commands
gave no output.
How can I configure the GridWay to submit jobs to Fork (not to PBS) ?????
_ _ _ _ _ _ _ _ _ _
Regads
Saumesh Kumar
IIT Roorkee
_ _ _ _ _ _ _ _ _ _
Regads
Saumesh Kumar
M.Tech (IT)
IIT Roorkee
On Mon, Apr 4, 2011 at 4:34 PM, Eduardo Huedo <[email protected]> wrote:
> Hi,
>
> Does a Globus submission with data staging work?
>
> Please send me the output of the following command:
>
> globusrun-ws -submit -F saum.grid -Ft PBS -s -c /bin/uname -a
>
>
> Regards,
>
> Dr. Eduardo Huedo Cuesta
> Associate Professor (Profesor Titular), Universidad Complutense de Madrid
> http://dsa-research.org/ehuedo
>
>
>
> 2011/4/4 Saumesh Kumar <[email protected]>
>
>> Hello Sir,
>>
>> I have installed GridWay-5.6.1 on a machine running globus-4.0.7 with PBS.
>> When I submit a job using gwsubmit command , the status of job remains
>> pending all the time. My grid has only single machine named "saum.grid". The
>> contents of various log file are given below:
>>
>> Content of gwd.log:
>>
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Sun Apr 3 18:45:04 2011 [IM][I]: Hosts discovered by MAD (mds4):
>> saum.grid
>> Sun Apr 3 18:46:32 2011 [UM][I]: Executing command grid-proxy-info
>> -identity
>> Sun Apr 3 18:46:32 2011 [UM][I]: User proxy info,
>> /O=Grid/OU=GlobusTest/OU=simpleCA-saum.grid/OU=grid/CN=guser01
>> Sun Apr 3 18:46:32 2011 [UM][I]: Loading execution MADs for user guser01
>> (/O=Grid/OU=GlobusTest/OU=simpleCA-saum.grid/OU=grid/CN=guser01).
>> Sun Apr 3 18:46:33 2011 [TM][I]: -- MARK --
>> Sun Apr 3 18:46:34 2011 [UM][I]: Execution MAD ws loaded
>> (exec:gw_em_mad_ws, args:, mode:rsl2).
>> Sun Apr 3 18:46:34 2011 [UM][I]: Loading transfer MADs for user guser01
>> (/O=Grid/OU=GlobusTest/OU=simpleCA-saum.grid/OU=grid/CN=guser01).
>> Sun Apr 3 18:46:35 2011 [UM][I]: Transfer MAD gridftp loaded (exec:
>> gw_tm_mad_ftp, arg: ).
>> Sun Apr 3 18:46:35 2011 [UM][I]: User guser01
>> (/O=Grid/OU=GlobusTest/OU=simpleCA-saum.grid/OU=grid/CN=guser01) registered.
>> Sun Apr 3 18:46:35 2011 [DM][I]: New job 0 allocated and initialized.
>> Sun Apr 3 18:46:38 2011 [UM][I]: -- MARK --
>> Sun Apr 3 18:46:41 2011 [EM][I]: -- MARK --
>> Sun Apr 3 18:46:45 2011 [IM][I]: -- MARK --
>> Sun Apr 3 18:46:46 2011 [DM][I]: Dispatching job 0 to saum.grid (workq).
>> Sun Apr 3 18:47:05 2011 [IM][I]: Discovering hosts.
>> Sun Apr 3 18:47:05 2011 [DM][I]: -- MARK --
>> Sun Apr 3 18:47:39 2011 [IM][I]: Hosts discovered by MAD (mds4):
>> saum.grid
>> Sun Apr 3 18:47:56 2011 [DM][I]: Rescheduling job 0.
>> Sun Apr 3 18:47:58 2011 [UM][I]: -- MARK --
>> Sun Apr 3 18:48:20 2011 [TM][I]: -- MARK --
>> Sun Apr 3 18:48:21 2011 [IM][I]: -- MARK --
>> Sun Apr 3 18:48:21 2011 [EM][I]: -- MARK --
>> Sun Apr 3 18:49:26 2011 [TM][I]: -- MARK --
>> Sun Apr 3 18:49:31 2011 [IM][I]: -- MARK --
>> Sun Apr 3 18:49:36 2011 [EM][I]: -- MARK --
>> Sun Apr 3 18:49:38 2011 [DM][I]: -- MARK --
>> Sun Apr 3 18:49:46 2011 [DM][I]: -- MARK --
>> Sun Apr 3 18:49:50 2011 [IM][I]: Discovering hosts.
>> Sun Apr 3 18:49:51 2011 [IM][I]: Discovering hosts.
>> Sun Apr 3 18:50:42 2011 [IM][I]: Hosts discovered by MAD (mds4):
>> saum.grid
>>
>> ============================================================================================================
>>
>> result of *gwps* command:
>>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> USER JID DM EM START END EXEC XFER EXIT
>> NAME HOST
>> guser01:0 0 pend ---- 18:46:35 --:--:-- 0:00:52 0:00:28 --
>> jt saum.grid/PBS
>>
>> ================================================================================================
>>
>>
>> result of *gwhistory 0*:
>>
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> HID START END PROLOG WRAPPER EPILOG MIGR REASON QUEUE
>> HOST
>> 0 18:54:01 18:54:11 0:00:01 0:00:06 0:00:03 0:00:00 err workq
>> saum.grid/PBS
>> 0 18:46:46 18:47:56 0:00:04 0:00:46 0:00:20 0:00:00 err workq
>> saum.grid/PBS
>>
>> ================================================================================================
>>
>>
>> content of job.log file:
>>
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Sun Apr 3 18:46:35 2011 [DM][I]: ----------- Job configuration file (jt)
>> values -----------
>> Sun Apr 3 18:46:35 2011 [DM][I]: EXECUTABLE :
>> /bin/hostname
>> Sun Apr 3 18:46:35 2011 [DM][I]: ARGUMENTS :
>> Sun Apr 3 18:46:35 2011 [DM][I]: INPUT_FILES (Total 0):
>> Sun Apr 3 18:46:35 2011 [DM][I]: OUTPUT_FILES (Total 0):
>> Sun Apr 3 18:46:35 2011 [DM][I]: RESTART_FILES (Total 0):
>> Sun Apr 3 18:46:35 2011 [DM][I]: STDIN_FILE : /dev/null
>> Sun Apr 3 18:46:35 2011 [DM][I]: STDOUT_FILE :
>> stdout.${JOB_ID}
>> Sun Apr 3 18:46:35 2011 [DM][I]: STDERR_FILE :
>> stderr.${JOB_ID}
>> Sun Apr 3 18:46:35 2011 [DM][I]: REQUIREMENTS :
>> Sun Apr 3 18:46:35 2011 [DM][I]: RANK :
>> Sun Apr 3 18:46:35 2011 [DM][I]: RESCHEDULING_INTERVAL : 0
>> Sun Apr 3 18:46:35 2011 [DM][I]: RESCHEDULING_THRESHOLD : 300
>> Sun Apr 3 18:46:35 2011 [DM][I]: SUSPENSION_TIMEOUT : 600
>> Sun Apr 3 18:46:35 2011 [DM][I]: CPULOAD_THRESHOLD : 50
>> Sun Apr 3 18:46:35 2011 [DM][I]: RESCHEDULE_ON_FAILURE : yes
>> Sun Apr 3 18:46:35 2011 [DM][I]: NUMBER_OF_RETRIES : 1
>> Sun Apr 3 18:46:35 2011 [DM][I]: CHECKPOINT_INTERVAL : 0
>> Sun Apr 3 18:46:35 2011 [DM][I]: CHECKPOINT_URL :
>> Sun Apr 3 18:46:35 2011 [DM][I]: WRAPPER :
>> /usr/local/gw-5.6.1/libexec/gw_wrapper.sh
>> Sun Apr 3 18:46:35 2011 [DM][I]: MONITOR :
>> Sun Apr 3 18:46:35 2011 [DM][I]: PRE_WRAPPER :
>> Sun Apr 3 18:46:35 2011 [DM][I]: PRE_WRAPPER_ARGUMENTS :
>> Sun Apr 3 18:46:35 2011 [DM][I]: TYPE : single
>> Sun Apr 3 18:46:35 2011 [DM][I]: NP : 1
>> Sun Apr 3 18:46:35 2011 [DM][I]: DEADLINE : 0:00:00 0
>> Sun Apr 3 18:46:35 2011 [DM][I]:
>> ----------------------------------------------------------
>> Sun Apr 3 18:46:35 2011 [DM][I]: New state is PENDING.
>> Sun Apr 3 18:46:46 2011 [DM][I]: New state is PROLOG.
>> Sun Apr 3 18:46:46 2011 [TM][I]: Creating remote job working directory:
>> Sun Apr 3 18:46:46 2011 [TM][I]: Target url:
>> gsiftp://saum.grid/~/.gw_guser01_0/.
>> Sun Apr 3 18:46:48 2011 [TM][I]: Remote job directory created.
>> Sun Apr 3 18:46:48 2011 [TM][I]: Staging input files:
>> Sun Apr 3 18:46:48 2011 [TM][I]: Source: /home/guser01/GridWay.
>> Sun Apr 3 18:46:48 2011 [TM][I]: Copying file
>> file:///usr/local/gw-5.6.1/var/0/job.env.
>> Sun Apr 3 18:46:48 2011 [TM][W]: Skipping file /bin/hostname,
>> absolute path.
>> Sun Apr 3 18:46:48 2011 [TM][W]: Skipping file /dev/null, absolute
>> path.
>> Sun Apr 3 18:46:48 2011 [TM][I]: Copying file
>> file:///usr/local/gw-5.6.1/libexec/gw_wrapper.sh.
>> Sun Apr 3 18:46:49 2011 [TM][I]: File
>> file:///usr/local/gw-5.6.1/var/0/job.env copied.
>> Sun Apr 3 18:46:50 2011 [TM][I]: File
>> file:///usr/local/gw-5.6.1/libexec/gw_wrapper.sh copied.
>> Sun Apr 3 18:46:50 2011 [TM][I]: All input files copied.
>> Sun Apr 3 18:46:50 2011 [DM][I]: Prolog done:
>> Sun Apr 3 18:46:50 2011 [DM][I]: Total time : 4
>> Sun Apr 3 18:46:50 2011 [DM][I]: New state is WRAPPER.
>> Sun Apr 3 18:46:50 2011 [EM][I]: Submitting wrapper to saum.grid/PBS, RSL
>> used is in /usr/local/gw-5.6.1/var/0/job.rsl.0.
>> Sun Apr 3 18:47:33 2011 [EM][I]: New execution state is PENDING.
>> Sun Apr 3 18:47:36 2011 [EM][I]: Execution state is PENDING.
>> Sun Apr 3 18:47:36 2011 [EM][I]: New execution state is ACTIVE.
>> Sun Apr 3 18:47:36 2011 [EM][I]: New execution state is DONE.
>> Sun Apr 3 18:47:36 2011 [DM][I]: Wrapper DONE:
>> Sun Apr 3 18:47:36 2011 [DM][I]: Active time : 0
>> Sun Apr 3 18:47:36 2011 [DM][I]: Suspension time : 46
>> Sun Apr 3 18:47:36 2011 [DM][I]: Total time : 46
>> Sun Apr 3 18:47:36 2011 [DM][I]: New state is EPILOG_STD.
>> Sun Apr 3 18:47:36 2011 [TM][I]: Staging output files:
>> Sun Apr 3 18:47:36 2011 [TM][I]: Source:
>> gsiftp://saum.grid/~/.gw_guser01_0/.
>> Sun Apr 3 18:47:36 2011 [TM][I]: Copying file stdout.wrapper.
>> Sun Apr 3 18:47:36 2011 [TM][I]: Copying file stderr.wrapper.
>> Sun Apr 3 18:47:49 2011 [TM][I]: File stdout.wrapper copied.
>> Sun Apr 3 18:47:51 2011 [TM][I]: File stderr.wrapper copied.
>> Sun Apr 3 18:47:51 2011 [TM][I]: All output files copied.
>> Sun Apr 3 18:47:51 2011 [DM][E]: Unable to find exit code, assuming that
>> the job failed or was cancelled.
>> Sun Apr 3 18:47:51 2011 [DM][I]: New state is EPILOG_RESTART.
>> Sun Apr 3 18:47:51 2011 [TM][I]: Staging output files:
>> Sun Apr 3 18:47:51 2011 [TM][I]: Source:
>> gsiftp://saum.grid/~/.gw_guser01_0/.
>> Sun Apr 3 18:47:51 2011 [TM][I]: Copying file stdout.execution.
>> Sun Apr 3 18:47:51 2011 [TM][I]: Copying file stderr.execution.
>> Sun Apr 3 18:47:53 2011 [TM][E]: Copy of file stdout.execution
>> failed.
>> Sun Apr 3 18:47:54 2011 [TM][E]: Copy of file stderr.execution
>> failed.
>> Sun Apr 3 18:47:54 2011 [TM][W]: Some output files were not copied.
>> Sun Apr 3 18:47:54 2011 [TM][W]: Removing remote directory:
>> Sun Apr 3 18:47:54 2011 [TM][W]: Target url:
>> gsiftp://saum.grid/~/.gw_guser01_0/.
>> Sun Apr 3 18:47:56 2011 [TM][I]: Remote job directory removed.
>> Sun Apr 3 18:47:56 2011 [DM][E]: Epilog failed:
>> Sun Apr 3 18:47:56 2011 [DM][E]: Total time : 20
>> Sun Apr 3 18:47:56 2011 [DM][I]: Rescheduling job.
>> Sun Apr 3 18:47:56 2011 [DM][I]: New state is PENDING.
>> Sun Apr 3 18:54:01 2011 [DM][I]: New state is PROLOG.
>> Sun Apr 3 18:54:01 2011 [TM][I]: Creating remote job working directory:
>> Sun Apr 3 18:54:01 2011 [TM][I]: Target url:
>> gsiftp://saum.grid/~/.gw_guser01_0/.
>> Sun Apr 3 18:54:01 2011 [TM][I]: Remote job directory created.
>> Sun Apr 3 18:54:01 2011 [TM][I]: Staging input files:
>> Sun Apr 3 18:54:01 2011 [TM][I]: Source: /home/guser01/GridWay.
>> Sun Apr 3 18:54:01 2011 [TM][I]: Copying file
>> file:///usr/local/gw-5.6.1/var/0/job.env.
>> Sun Apr 3 18:54:01 2011 [TM][W]: Skipping file /bin/hostname,
>> absolute path.
>> Sun Apr 3 18:54:01 2011 [TM][W]: Skipping file /dev/null, absolute
>> path.
>> Sun Apr 3 18:54:01 2011 [TM][I]: Copying file
>> file:///usr/local/gw-5.6.1/libexec/gw_wrapper.sh.
>> Sun Apr 3 18:54:02 2011 [TM][I]: File
>> file:///usr/local/gw-5.6.1/var/0/job.env copied.
>> Sun Apr 3 18:54:02 2011 [TM][I]: File
>> file:///usr/local/gw-5.6.1/libexec/gw_wrapper.sh copied.
>> Sun Apr 3 18:54:02 2011 [TM][I]: All input files copied.
>> Sun Apr 3 18:54:02 2011 [DM][I]: Prolog done:
>> Sun Apr 3 18:54:02 2011 [DM][I]: Total time : 1
>> Sun Apr 3 18:54:02 2011 [DM][I]: New state is WRAPPER.
>> Sun Apr 3 18:54:02 2011 [EM][I]: Submitting wrapper to saum.grid/PBS, RSL
>> used is in /usr/local/gw-5.6.1/var/0/job.rsl.1.
>> Sun Apr 3 18:54:07 2011 [EM][I]: New execution state is PENDING.
>> Sun Apr 3 18:54:08 2011 [EM][I]: Execution state is PENDING.
>> Sun Apr 3 18:54:08 2011 [EM][I]: New execution state is ACTIVE.
>> Sun Apr 3 18:54:08 2011 [EM][I]: New execution state is DONE.
>> Sun Apr 3 18:54:08 2011 [DM][I]: Wrapper DONE:
>> Sun Apr 3 18:54:08 2011 [DM][I]: Active time : 0
>> Sun Apr 3 18:54:08 2011 [DM][I]: Suspension time : 6
>> Sun Apr 3 18:54:08 2011 [DM][I]: Total time : 6
>> Sun Apr 3 18:54:08 2011 [DM][I]: New state is EPILOG_STD.
>> Sun Apr 3 18:54:08 2011 [TM][I]: Staging output files:
>> Sun Apr 3 18:54:08 2011 [TM][I]: Source:
>> gsiftp://saum.grid/~/.gw_guser01_0/.
>> Sun Apr 3 18:54:08 2011 [TM][I]: Copying file stdout.wrapper.
>> Sun Apr 3 18:54:08 2011 [TM][I]: Copying file stderr.wrapper.
>> Sun Apr 3 18:54:09 2011 [TM][I]: File stdout.wrapper copied.
>> Sun Apr 3 18:54:10 2011 [TM][I]: File stderr.wrapper copied.
>> Sun Apr 3 18:54:10 2011 [TM][I]: All output files copied.
>> Sun Apr 3 18:54:10 2011 [DM][E]: Unable to find exit code, assuming that
>> the job failed or was cancelled.
>> Sun Apr 3 18:54:10 2011 [DM][I]: New state is EPILOG_RESTART.
>> Sun Apr 3 18:54:10 2011 [TM][I]: Staging output files:
>> Sun Apr 3 18:54:10 2011 [TM][I]: Source:
>> gsiftp://saum.grid/~/.gw_guser01_0/.
>> Sun Apr 3 18:54:10 2011 [TM][I]: Copying file stdout.execution.
>> Sun Apr 3 18:54:10 2011 [TM][I]: Copying file stderr.execution.
>> Sun Apr 3 18:54:10 2011 [TM][E]: Copy of file stdout.execution
>> failed.
>> Sun Apr 3 18:54:11 2011 [TM][E]: Copy of file stderr.execution
>> failed.
>> Sun Apr 3 18:54:11 2011 [TM][W]: Some output files were not copied.
>> Sun Apr 3 18:54:11 2011 [TM][W]: Removing remote directory:
>> Sun Apr 3 18:54:11 2011 [TM][W]: Target url:
>> gsiftp://saum.grid/~/.gw_guser01_0/.
>> Sun Apr 3 18:54:11 2011 [TM][I]: Remote job directory removed.
>> Sun Apr 3 18:54:11 2011 [DM][E]: Epilog failed:
>> Sun Apr 3 18:54:11 2011 [DM][E]: Total time : 3
>> Sun Apr 3 18:54:11 2011 [DM][I]: Rescheduling job.
>> Sun Apr 3 18:54:11 2011 [DM][I]: New state is PENDING.
>>
>> =========================================================================================================
>>
>>
>>
>> And the globus container log corresponding to the gridway's* gwsubmit*is:
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> 2011-04-03 18:35:17,183 INFO exec.StateMachine
>> [RunQueueThread_1,logJobAccepted:3424] Job
>> 05217630-5df3-11e0-aafa-e2327fe73aea accepted for local user 'guser01'
>> 2011-04-03 18:35:18,497 INFO exec.StateMachine
>> [RunQueueThread_2,logJobSubmitted:3436] Job
>> 05217630-5df3-11e0-aafa-e2327fe73aea submitted with local job ID
>> '7.saum.grid'
>> 2011-04-03 18:35:23,168 INFO exec.StateMachine
>> [RunQueueThread_11,logJobSucceeded:3446] Job
>> 05217630-5df3-11e0-aafa-e2327fe73aea finished successfully
>> 2011-04-03 18:47:29,586 INFO exec.StateMachine
>> [RunQueueThread_13,logJobAccepted:3424] Job
>> b9e4ae10-5df4-11e0-aafa-e2327fe73aea accepted for local user 'guser01'
>> 2011-04-03 18:47:31,373 INFO exec.StateMachine
>> [RunQueueThread_14,logJobSubmitted:3436] Job
>> b9e4ae10-5df4-11e0-aafa-e2327fe73aea submitted with local job ID
>> '8.saum.grid'
>> 2011-04-03 18:47:32,411 INFO exec.StateMachine
>> [RunQueueThread_5,logJobSucceeded:3446] Job
>> b9e4ae10-5df4-11e0-aafa-e2327fe73aea finished successfully
>>
>> =======================================================================================================================
>>
>>
>>
>> Plz tell me what is wrong with the GridWay's *gwsubmit*. How should I
>> solve the issue???
>>
>> _ _ _ _ _ _ _ _ _ _
>> Regads
>> Saumesh Kumar
>> IIT Roorkee
>>
>
>