You can use the GRAM2 error codes to look up the error.
http://www-unix.globus.org/toolkit/docs/4.0/execution/prewsgram/user- index.html#s-gram-user-errorcodes

24 GLOBUS_GRAM_PROTOCOL_ERROR_INVALID_SCRIPT_REPLY the job manager detected an invalid script response

There is some doc for this problem here:
http://www-unix.globus.org/toolkit/docs/4.0/execution/wsgram/user- index.html#s-wsgram-user-troubleshooting Find the heading for "The job manager detected an invalid script response"

If there is nothing obvious, then there is a section here on debugging script executions, where you save out the contents of the perl job description used by the scripts and run the perl submission command by hand: http://www-unix.globus.org/toolkit/docs/4.0/execution/wsgram/ developer-index.html#id2565465

-Stu

On Jul 25, 2007, at Jul 25, 6:27 AM, Francois Hornoy wrote:


  Ok. I have some debugging informations now. So i repeat the context.

Here is the basic command: globusrun-ws -submit -F https://MyIP: 8443/wsrf/services/ManagedJobFactoryService -c /bin/hostname

 Works fine. If i add the "-Ft SGE" option, i get an error.

 On the client side (output of that command above) is:

[EMAIL PROTECTED]:~$ globusrun-ws -submit -F https://MyIP:8443/wsrf/ services/ManagedJobFactoryService -Ft SGE -c /bin/hostname
Submitting job...Done.
Job ID: uuid:9007c47c-3aa0-11dc-86fc-0017f23158ca
Termination time: 07/26/2007 11:16 GMT
Current job state: Failed
Destroying job...Done.
globusrun-ws: Job failed: Internal fault occurred while running the submit script.
[EMAIL PROTECTED]:~$

On the server side, i attached the container output. It's a bit long, i think the relevant lines are here (around line 828 in the attached file):

2007-07-25 13:17:01,862 DEBUG exec.JobManagerScript [Thread-18,run: 208] Executing command: /usr/bin/sudo -H -u fhornoy -S /opt/globus/libexec/globus-gridmap- and-execute -g /etc/grid-security/grid-mapfile /opt/globus/libexec/ globus- job-manager-script.pl -m sge -f /opt/globus/tmp/ gram_job_mgr53582.tmp -\
c submit
2007-07-25 13:17:02,056 DEBUG exec.JobManagerScript [Thread-18,run: 225] first line: GRAM_SCRIPT_ERROR:24 2007-07-25 13:17:02,057 DEBUG exec.JobManagerScript [Thread-18,run: 228] Read line: GRAM_SCRIPT_ERROR:24 2007-07-25 13:17:02,059 DEBUG exec.JobManagerScript [Thread-18,run: 335] failure message: null 2007-07-25 13:17:02,059 DEBUG exec.JobManagerScript [Thread-18,setDone:345] script is done, setting done flag 2007-07-25 13:17:02,060 DEBUG exec.StateMachine [RunQueueThread_0,processSubmitState:1105] Done waiting for submit script 2007-07-25 13:17:02,060 DEBUG exec.StateMachine [RunQueueThread_0,processSubmitState:1129] script return code: 24 2007-07-25 13:17:02,060 DEBUG exec.StateMachine [RunQueueThread_0,processSubmitState:1134] script return code means error! 2007-07-25 13:17:02,060 DEBUG exec.StateMachine [RunQueueThread_0,createFaultFromErrorCode:3027] Creating fault from error code 24 2007-07-25 13:17:02,066 DEBUG utils.FaultUtils [RunQueueThread_0,makeFault:460] Fault Class: class org.globus.exec.generated.InternalFaultType 2007-07-25 13:17:02,066 DEBUG utils.FaultUtils [RunQueueThread_0,makeFault:461] Resource Key: {http:// www.globus.org/namespaces/2004/10/gram/job} ResourceID=9007c47c-3aa0-11dc-86fc-0017f23158ca 2007-07-25 13:17:02,066 DEBUG utils.FaultUtils [RunQueueThread_0,makeFault:462] Description: Internal fault occurred while running the submit script. 2007-07-25 13:17:02,067 DEBUG utils.FaultUtils [RunQueueThread_0,makeFault:463] Cause: null 2007-07-25 13:17:02,067 DEBUG utils.FaultUtils [RunQueueThread_0,makeFault:464] State when failure occurred Unsubmitted 2007-07-25 13:17:02,067 DEBUG utils.FaultUtils [RunQueueThread_0,makeFault:466] Script Command: submit 2007-07-25 13:17:02,067 DEBUG utils.FaultUtils [RunQueueThread_0,makeFault:467] GT2 Error Code: 24 2007-07-25 13:17:02,072 DEBUG utils.FaultUtils [RunQueueThread_0,makeFault:519] Script Command: submit 2007-07-25 13:17:02,072 DEBUG ManagedJobResourceImpl. 9007c47c-3aa0-11dc-86fc-0017f23158ca [RunQueueThread_0,setFault: 346] fault element name: InternalFaultType 2007-07-25 13:17:02,072 DEBUG ManagedJobResourceImpl. 9007c47c-3aa0-11dc-86fc-0017f23158ca [RunQueueThread_0,setFault: 350] fault element name: InternalFault 2007-07-25 13:17:02,072 DEBUG ManagedJobResourceImpl. 9007c47c-3aa0-11dc-86fc-0017f23158ca [RunQueueThread_0,setFault: 353] fault element name: internalFault


So, i did not find much on google about error 24, and it's not very explicit.


 Cheers,
 Francois.


On 7/23/07, alexander.beck-ratzka <alexander.beck- [EMAIL PROTECTED]> wrote: On Monday 23 July 2007 14:19, Francois Hornoy wrote: > On 7/23/07, alexander.beck-ratzka <alexander.beck- [EMAIL PROTECTED]> wrote:
> > On Monday 23 July 2007 11:18, Francois Hornoy wrote:
> > >  Hi Alexander,
> > >
> > > I tried a simple example based on yours. It just stageIn a file, > > > "/bin/cat" it (that's the job), and i stageOut the .out and .err files.
> > >
> > >  I keep having this error (see end of mail).
> > >
> > >  If i watch globusrun-ws -status -j job.id nad if i watch the
> > > "Execution Host", i can see that the StageIn step is good, the file
> > > 523.sh is well transferred. But then, it crashes.
> > >
> > > Of course, if i do exactly the same thing without "-Ft SGE", it works
> > > perfectly.
> > >
> > >   Cheers,
> > >   Francois.
> > >
> > > 2007-07-23 11:10:53,427 INFO
> > > exec.StateMachine[RunQueueThread_5,logJobAccepted:3193] Job
> > > 9cd21054-38fc-11dc-b055-0017f23158ca accepted for local user 'globus'
> > > 2007-07-23 11:10:58,370 ERROR
> > > service.TransferWork[WorkThread-39,run:724] Terminal transfer error:
> > > Error deleting a file
> > >  "/opt/globus/523.out" [Caused by: Server refused performing the
> >
> > request.
> >
> > > Custom message: Server refused deleting file (error code 1) [Nested > > > exception message: Custom message: Unexpected reply: 500- Command fai
> >
> > led :
> > > System error in unlink: No such file or directory
> > > 500-A system call failed: No such file or directory
> > > 500 End.]]
> > > Error deleting a file
> > >  "/opt/globus/523.out"
> > > . Caused by
> > > org.globus.ftp.exception.ServerException: Server refused performing the > > > request. Custom message: Server refused deleting file (error code 1) > > > [Nested exception message: Custom message: Unexpected reply: 500-Comm
> >
> > and
> >
> > > failed : System error in unlink: No such file or directory
> > > 500-A system call failed: No such file or directory
> > > 500 End.].  Nested exception is
> > > org.globus.ftp.exception.UnexpectedReplyCodeException: Custom message: > > > Unexpected reply: 500-Command failed : System error in unlink: No such
> >
> > file
> >
> > > or directory
> > > 500-A system call failed: No such file or directory
> > > 500 End.
> > >         at org.globus.ftp.vanilla.FTPControlChannel.execute (
> > > FTPControlChannel.java:328)
> > > at org.globus.ftp.FTPClient.deleteFile(FTPClient.java: 253) > > > at org.globus.transfer.reliable.service.DeleteClient.delete(
> > > DeleteClient.java:189)
> > > at org.globus.transfer.reliable.service.TransferWork.run(
> > > TransferWork.java:688)
> > > at org.globus.wsrf.impl.work.WorkManagerImpl $WorkWrapper.run (
> > > WorkManagerImpl.java:355)
> > >         at java.lang.Thread.run(Thread.java:595)
> > > 2007-07-23 11:10:58,628 ERROR
> > > service.TransferWork[WorkThread-40,run:724] Terminal transfer error:
> > > Error deleting a file
> > >  "/opt/globus/523.err" [Caused by: Server refused performing the
> >
> > request.
> >
> > > Custom message: Server refused deleting file (error code 1) [Nested > > > exception message: Custom message: Unexpected reply: 500- Command fai
> >
> > led :
> > > System error in unlink: No such file or directory
> > > 500-A system call failed: No such file or directory
> > > 500 End.]]
> > > Error deleting a file
> > >  "/opt/globus/523.err"
> >
> > It seems that you haven't a $GLOBUS_USER_HOME, this would explain some
> > problems. In the rsl file you have:
> >
> > ${GLOBUS_USER_HOME}/523.err 73 as stderr, and I don't believe that
> > $GLOBUS_USER_HOME is /opt/globus for a simple grid user... Do you see any
> > of
> > the poststage datasets, if you poststage them?
>
>  Actually, i'm mapped to the remote globus user so that points to
> /opt/globus/ . And as i said, 523.sh appears in /opt/globus some staging
> seems to work...
>
>  What do you mean by poststage datasets?
>

At the end of my rsl dataset I've a passage as:

#############################cut here######################
        <fileStageOut>
                <!-- stage out stdout -->
                <transfer>

<sourceUrl>file:///${GLOBUS_USER_HOME}/523.out</sourceUrl>

<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/ 523.out</destinationUrl>
                </transfer>
                <!-- stage out stderr -->
                <transfer>

<sourceUrl>file:///${GLOBUS_USER_HOME}/523.err</sourceUrl>

<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/ 523.err</destinationUrl>
                </transfer>
                <!-- stage out log -->
                <transfer>

<sourceUrl>file:///${GLOBUS_USER_HOME}/GEO600/tasks/523.log</ sourceUrl>

<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/ 523.log</destinationUrl>
                </transfer>
                <!-- stage out task results -->
                <transfer>

<sourceUrl>file:///${GLOBUS_USER_HOME}/GEO600/tasks/523.tar</ sourceUrl>

<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/ 523.tar</destinationUrl>
                </transfer>
        </fileStageOut>
#############################cut here######################

Here the postaging or filestageout is performed. The other section (cleanup)
describes, which datasets should be deleted.

Okay, if you entering to SGE, you are getting another environment. Your output files will be probabaly located on worker node. I believe that SGE can't connect to /opt/globus. So the ouptut files won't be written! Contact your system administrator, ask him, which filesystem directories can be accessed
by SGE jobs...

> Is there any logfile or something else i can do to have some verbose
> debugging informations? Where is the "submit script" that fails?
>
>

Option "-debug" at the globusrun-ws call

Cheers

Alexander

<output.log>

Reply via email to