Ok. I have some debugging informations now. So i repeat the context.
Here is the basic command: globusrun-ws -submit -F https://MyIP:
8443/wsrf/services/ManagedJobFactoryService -c /bin/hostname
Works fine. If i add the "-Ft SGE" option, i get an error.
On the client side (output of that command above) is:
[EMAIL PROTECTED]:~$ globusrun-ws -submit -F https://MyIP:8443/wsrf/
services/ManagedJobFactoryService -Ft SGE -c /bin/hostname
Submitting job...Done.
Job ID: uuid:9007c47c-3aa0-11dc-86fc-0017f23158ca
Termination time: 07/26/2007 11:16 GMT
Current job state: Failed
Destroying job...Done.
globusrun-ws: Job failed: Internal fault occurred while running the
submit script.
[EMAIL PROTECTED]:~$
On the server side, i attached the container output. It's a bit
long, i think the relevant lines are here (around line 828 in the
attached file):
2007-07-25 13:17:01,862 DEBUG exec.JobManagerScript [Thread-18,run:
208] Executing command:
/usr/bin/sudo -H -u fhornoy -S /opt/globus/libexec/globus-gridmap-
and-execute -g /etc/grid-security/grid-mapfile /opt/globus/libexec/
globus- job-manager-script.pl -m sge -f /opt/globus/tmp/
gram_job_mgr53582.tmp -\
c submit
2007-07-25 13:17:02,056 DEBUG exec.JobManagerScript [Thread-18,run:
225] first line: GRAM_SCRIPT_ERROR:24
2007-07-25 13:17:02,057 DEBUG exec.JobManagerScript [Thread-18,run:
228] Read line: GRAM_SCRIPT_ERROR:24
2007-07-25 13:17:02,059 DEBUG exec.JobManagerScript [Thread-18,run:
335] failure message: null
2007-07-25 13:17:02,059 DEBUG exec.JobManagerScript
[Thread-18,setDone:345] script is done, setting done flag
2007-07-25 13:17:02,060 DEBUG exec.StateMachine
[RunQueueThread_0,processSubmitState:1105] Done waiting for submit
script
2007-07-25 13:17:02,060 DEBUG exec.StateMachine
[RunQueueThread_0,processSubmitState:1129] script return code: 24
2007-07-25 13:17:02,060 DEBUG exec.StateMachine
[RunQueueThread_0,processSubmitState:1134] script return code means
error!
2007-07-25 13:17:02,060 DEBUG exec.StateMachine
[RunQueueThread_0,createFaultFromErrorCode:3027] Creating fault
from error code 24
2007-07-25 13:17:02,066 DEBUG utils.FaultUtils
[RunQueueThread_0,makeFault:460] Fault Class: class
org.globus.exec.generated.InternalFaultType
2007-07-25 13:17:02,066 DEBUG utils.FaultUtils
[RunQueueThread_0,makeFault:461] Resource Key: {http://
www.globus.org/namespaces/2004/10/gram/job}
ResourceID=9007c47c-3aa0-11dc-86fc-0017f23158ca
2007-07-25 13:17:02,066 DEBUG utils.FaultUtils
[RunQueueThread_0,makeFault:462] Description: Internal fault
occurred while running the submit script.
2007-07-25 13:17:02,067 DEBUG utils.FaultUtils
[RunQueueThread_0,makeFault:463] Cause: null
2007-07-25 13:17:02,067 DEBUG utils.FaultUtils
[RunQueueThread_0,makeFault:464] State when failure occurred
Unsubmitted
2007-07-25 13:17:02,067 DEBUG utils.FaultUtils
[RunQueueThread_0,makeFault:466] Script Command: submit
2007-07-25 13:17:02,067 DEBUG utils.FaultUtils
[RunQueueThread_0,makeFault:467] GT2 Error Code: 24
2007-07-25 13:17:02,072 DEBUG utils.FaultUtils
[RunQueueThread_0,makeFault:519] Script Command: submit
2007-07-25 13:17:02,072 DEBUG ManagedJobResourceImpl.
9007c47c-3aa0-11dc-86fc-0017f23158ca [RunQueueThread_0,setFault:
346] fault element name: InternalFaultType
2007-07-25 13:17:02,072 DEBUG ManagedJobResourceImpl.
9007c47c-3aa0-11dc-86fc-0017f23158ca [RunQueueThread_0,setFault:
350] fault element name: InternalFault
2007-07-25 13:17:02,072 DEBUG ManagedJobResourceImpl.
9007c47c-3aa0-11dc-86fc-0017f23158ca [RunQueueThread_0,setFault:
353] fault element name: internalFault
So, i did not find much on google about error 24, and it's not
very explicit.
Cheers,
Francois.
On 7/23/07, alexander.beck-ratzka <alexander.beck-
[EMAIL PROTECTED]> wrote: On Monday 23 July 2007 14:19, Francois
Hornoy wrote:
> On 7/23/07, alexander.beck-ratzka <alexander.beck-
[EMAIL PROTECTED]> wrote:
> > On Monday 23 July 2007 11:18, Francois Hornoy wrote:
> > > Hi Alexander,
> > >
> > > I tried a simple example based on yours. It just stageIn a
file,
> > > "/bin/cat" it (that's the job), and i stageOut the .out
and .err files.
> > >
> > > I keep having this error (see end of mail).
> > >
> > > If i watch globusrun-ws -status -j job.id nad if i watch the
> > > "Execution Host", i can see that the StageIn step is good,
the file
> > > 523.sh is well transferred. But then, it crashes.
> > >
> > > Of course, if i do exactly the same thing without "-Ft SGE",
it works
> > > perfectly.
> > >
> > > Cheers,
> > > Francois.
> > >
> > > 2007-07-23 11:10:53,427 INFO
> > > exec.StateMachine[RunQueueThread_5,logJobAccepted:3193] Job
> > > 9cd21054-38fc-11dc-b055-0017f23158ca accepted for local user
'globus'
> > > 2007-07-23 11:10:58,370 ERROR
> > > service.TransferWork[WorkThread-39,run:724] Terminal transfer
error:
> > > Error deleting a file
> > > "/opt/globus/523.out" [Caused by: Server refused performing the
> >
> > request.
> >
> > > Custom message: Server refused deleting file (error code 1)
[Nested
> > > exception message: Custom message: Unexpected reply: 500-
Command fai
> >
> > led :
> > > System error in unlink: No such file or directory
> > > 500-A system call failed: No such file or directory
> > > 500 End.]]
> > > Error deleting a file
> > > "/opt/globus/523.out"
> > > . Caused by
> > > org.globus.ftp.exception.ServerException: Server refused
performing the
> > > request. Custom message: Server refused deleting file (error
code 1)
> > > [Nested exception message: Custom message: Unexpected reply:
500-Comm
> >
> > and
> >
> > > failed : System error in unlink: No such file or directory
> > > 500-A system call failed: No such file or directory
> > > 500 End.]. Nested exception is
> > > org.globus.ftp.exception.UnexpectedReplyCodeException:
Custom message:
> > > Unexpected reply: 500-Command failed : System error in
unlink: No such
> >
> > file
> >
> > > or directory
> > > 500-A system call failed: No such file or directory
> > > 500 End.
> > > at org.globus.ftp.vanilla.FTPControlChannel.execute (
> > > FTPControlChannel.java:328)
> > > at org.globus.ftp.FTPClient.deleteFile(FTPClient.java:
253)
> > > at
org.globus.transfer.reliable.service.DeleteClient.delete(
> > > DeleteClient.java:189)
> > > at
org.globus.transfer.reliable.service.TransferWork.run(
> > > TransferWork.java:688)
> > > at org.globus.wsrf.impl.work.WorkManagerImpl
$WorkWrapper.run (
> > > WorkManagerImpl.java:355)
> > > at java.lang.Thread.run(Thread.java:595)
> > > 2007-07-23 11:10:58,628 ERROR
> > > service.TransferWork[WorkThread-40,run:724] Terminal transfer
error:
> > > Error deleting a file
> > > "/opt/globus/523.err" [Caused by: Server refused performing the
> >
> > request.
> >
> > > Custom message: Server refused deleting file (error code 1)
[Nested
> > > exception message: Custom message: Unexpected reply: 500-
Command fai
> >
> > led :
> > > System error in unlink: No such file or directory
> > > 500-A system call failed: No such file or directory
> > > 500 End.]]
> > > Error deleting a file
> > > "/opt/globus/523.err"
> >
> > It seems that you haven't a $GLOBUS_USER_HOME, this would
explain some
> > problems. In the rsl file you have:
> >
> > ${GLOBUS_USER_HOME}/523.err 73 as stderr, and I don't believe that
> > $GLOBUS_USER_HOME is /opt/globus for a simple grid user... Do
you see any
> > of
> > the poststage datasets, if you poststage them?
>
> Actually, i'm mapped to the remote globus user so that points to
> /opt/globus/ . And as i said, 523.sh appears in /opt/globus some
staging
> seems to work...
>
> What do you mean by poststage datasets?
>
At the end of my rsl dataset I've a passage as:
#############################cut here######################
<fileStageOut>
<!-- stage out stdout -->
<transfer>
<sourceUrl>file:///${GLOBUS_USER_HOME}/523.out</sourceUrl>
<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/
523.out</destinationUrl>
</transfer>
<!-- stage out stderr -->
<transfer>
<sourceUrl>file:///${GLOBUS_USER_HOME}/523.err</sourceUrl>
<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/
523.err</destinationUrl>
</transfer>
<!-- stage out log -->
<transfer>
<sourceUrl>file:///${GLOBUS_USER_HOME}/GEO600/tasks/523.log</
sourceUrl>
<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/
523.log</destinationUrl>
</transfer>
<!-- stage out task results -->
<transfer>
<sourceUrl>file:///${GLOBUS_USER_HOME}/GEO600/tasks/523.tar</
sourceUrl>
<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/
523.tar</destinationUrl>
</transfer>
</fileStageOut>
#############################cut here######################
Here the postaging or filestageout is performed. The other section
(cleanup)
describes, which datasets should be deleted.
Okay, if you entering to SGE, you are getting another environment.
Your output
files will be probabaly located on worker node. I believe that SGE
can't
connect to /opt/globus. So the ouptut files won't be written!
Contact your
system administrator, ask him, which filesystem directories can be
accessed
by SGE jobs...
> Is there any logfile or something else i can do to have some
verbose
> debugging informations? Where is the "submit script" that fails?
>
>
Option "-debug" at the globusrun-ws call
Cheers
Alexander
<output.log>