Well. I've been taking a look in the file:
/opt/globus/lib/perl/Globus/GRAM/JobManager/sge.pm to search where was the
GRAM_SCRIPT_ERROR:24 raised.

I found that it is on that piece of code (sge.pm attached), line 422:

   #####
   # Determining job request type.
   #
   print("Determining job type");
   print("  Job is of type " . $description->jobtype());
   if($description->jobtype() eq "mpi" ||
      $description->jobtype() eq "multiple")
   {
       #####
       # Check if RSL attribute parallel_environment is provided
       #
       if($description->parallel_environment())
       {
           $mpi_pe = $description->parallel_environment();
       }

       if(!$mpi_pe || $mpi_pe eq "NONE"){
         print("ERROR: Parallel Environment (PE) failure!");
           print("  MPI/multiple job was submitted, but no PE set");
           print("  by neither user nor administrator");
           return Globus::GRAM::Error::INVALID_SCRIPT_REPLY;
       }
       else
       {
           print("  PE is $mpi_pe");
           $sge_job_script->print("#\$ -pe $mpi_pe "
                                  . $description->count() . "\n");
       }


So, here is the output related to that piece of code:

Determining job type
Job is of type multiple
ERROR: Parallel Environment (PE) failure!
MPI/multiple job was submitted, but no PE set  by neither user nor
administrator
GRAM_SCRIPT_ERROR:24

So, 2 questions come in my mind:
  * it says that i submitted a multiple/mpi job. But "-c /bin/hostname" is
not, isn't it?
  * what must i do to set that "Parallele Environment" ?

Cheers,
Francois.



On 7/25/07, Francois Hornoy <[EMAIL PROTECTED]> wrote:


  Ok. I have some debugging informations now. So i repeat the context.

 Here is the basic command: globusrun-ws -submit -F 
https://MyIP:8443/wsrf/services/ManagedJobFactoryService
-c /bin/hostname

 Works fine. If i add the "-Ft SGE" option, i get an error.

 On the client side (output of that command above) is:

[EMAIL PROTECTED]:~$ globusrun-ws -submit -F
https://MyIP:8443/wsrf/services/ManagedJobFactoryService -Ft SGE -c
/bin/hostname
Submitting job...Done.
Job ID: uuid:9007c47c-3aa0-11dc-86fc-0017f23158ca
Termination time: 07/26/2007 11:16 GMT
Current job state: Failed
Destroying job...Done.
globusrun-ws: Job failed: Internal fault occurred while running the submit
script.
[EMAIL PROTECTED]:~$

 On the server side, i attached the container output. It's a bit long, i
think the relevant lines are here (around line 828 in the attached file):

2007-07-25 13:17:01,862 DEBUG exec.JobManagerScript [Thread-18,run:208]
Executing command:
/usr/bin/sudo -H -u fhornoy -S
/opt/globus/libexec/globus-gridmap-and-execute -g
/etc/grid-security/grid-mapfile /opt/globus/libexec/globus-
job-manager-script.pl -m sge -f /opt/globus/tmp/gram_job_mgr53582.tmp -\
c submit
2007-07-25 13:17:02,056 DEBUG exec.JobManagerScript [Thread-18,run:225]
first line: GRAM_SCRIPT_ERROR:24
2007-07-25 13:17:02,057 DEBUG exec.JobManagerScript [Thread-18,run:228]
Read line: GRAM_SCRIPT_ERROR:24
2007-07-25 13:17:02,059 DEBUG exec.JobManagerScript [Thread-18,run:335]
failure message: null
2007-07-25 13:17:02,059 DEBUG exec.JobManagerScript[Thread-18,setDone:345] 
script is done, setting done flag
2007-07-25 13:17:02,060 DEBUG 
exec.StateMachine[RunQueueThread_0,processSubmitState:1105] Done waiting for 
submit script
2007-07-25 13:17:02,060 DEBUG 
exec.StateMachine[RunQueueThread_0,processSubmitState:1129] script return code: 
24
2007-07-25 13:17:02,060 DEBUG 
exec.StateMachine[RunQueueThread_0,processSubmitState:1134] script return code 
means error!
2007-07-25 13:17:02,060 DEBUG 
exec.StateMachine[RunQueueThread_0,createFaultFromErrorCode:3027] Creating 
fault from error
code 24
2007-07-25 13:17:02,066 DEBUG utils.FaultUtils[RunQueueThread_0,makeFault:460] 
Fault Class: class
org.globus.exec.generated.InternalFaultType
2007-07-25 13:17:02,066 DEBUG utils.FaultUtils[RunQueueThread_0,makeFault:461] 
Resource Key: 
{http://www.globus.org/namespaces/2004/10/gram/job}ResourceID=9007c47c-3aa0-11dc-86fc-0017f23158ca

<http://www.globus.org/namespaces/2004/10/gram/job%7DResourceID=9007c47c-3aa0-11dc-86fc-0017f23158ca>
2007-07-25 13:17:02,066 DEBUG utils.FaultUtils[RunQueueThread_0,makeFault:462] 
Description: Internal fault occurred while
running the submit script.
2007-07-25 13:17:02,067 DEBUG utils.FaultUtils[RunQueueThread_0,makeFault:463] 
Cause: null
2007-07-25 13:17:02,067 DEBUG utils.FaultUtils[RunQueueThread_0,makeFault:464] 
State when failure occurred Unsubmitted
2007-07-25 13:17:02,067 DEBUG utils.FaultUtils[RunQueueThread_0,makeFault:466] 
Script Command: submit
2007-07-25 13:17:02,067 DEBUG utils.FaultUtils[RunQueueThread_0,makeFault:467] 
GT2 Error Code: 24
2007-07-25 13:17:02,072 DEBUG utils.FaultUtils[RunQueueThread_0,makeFault:519] 
Script Command: submit
2007-07-25 13:17:02,072 DEBUG
ManagedJobResourceImpl.9007c47c-3aa0-11dc-86fc-0017f23158ca[RunQueueThread_0,setFault:346]
 fault element name: InternalFaultType
2007-07-25 13:17:02,072 DEBUG
ManagedJobResourceImpl.9007c47c-3aa0-11dc-86fc-0017f23158ca[RunQueueThread_0,setFault:350]
 fault element name: InternalFault
2007-07-25 13:17:02,072 DEBUG
ManagedJobResourceImpl.9007c47c-3aa0-11dc-86fc-0017f23158ca[RunQueueThread_0,setFault:353]
 fault element name: internalFault


 So, i did not find much on google about error 24, and it's not very
explicit.


 Cheers,
 Francois.


On 7/23/07, alexander.beck-ratzka <[EMAIL PROTECTED]>
wrote:
>
> On Monday 23 July 2007 14:19, Francois Hornoy wrote:
> > On 7/23/07, alexander.beck-ratzka <[EMAIL PROTECTED]>
> wrote:
> > > On Monday 23 July 2007 11:18, Francois Hornoy wrote:
> > > >  Hi Alexander,
> > > >
> > > >  I tried a simple example based on yours. It just stageIn a file,
> > > > "/bin/cat" it (that's the job), and i stageOut the .out and .err
> files.
> > > >
> > > >  I keep having this error (see end of mail).
> > > >
> > > >  If i watch globusrun-ws -status -j job.id nad if i watch the
> > > > "Execution Host", i can see that the StageIn step is good, the
> file
> > > > 523.sh is well transferred. But then, it crashes.
> > > >
> > > >  Of course, if i do exactly the same thing without "-Ft SGE", it
> works
> > > > perfectly.
> > > >
> > > >   Cheers,
> > > >   Francois.
> > > >
> > > > 2007-07-23 11:10:53,427 INFO
> > > > exec.StateMachine[RunQueueThread_5,logJobAccepted:3193] Job
> > > > 9cd21054-38fc-11dc-b055-0017f23158ca accepted for local user
> 'globus'
> > > > 2007-07-23 11:10:58,370 ERROR
> > > > service.TransferWork[WorkThread-39,run:724] Terminal transfer
> error:
> > > > Error deleting a file
> > > >  "/opt/globus/523.out" [Caused by: Server refused performing the
> > >
> > > request.
> > >
> > > > Custom message: Server refused deleting file (error code 1)
> [Nested
> > > > exception message:  Custom message: Unexpected reply: 500-Command
> fai
> > >
> > > led :
> > > > System error in unlink: No such file or directory
> > > > 500-A system call failed: No such file or directory
> > > > 500 End.]]
> > > > Error deleting a file
> > > >  "/opt/globus/523.out"
> > > > . Caused by
> > > > org.globus.ftp.exception.ServerException: Server refused
> performing the
> > > > request. Custom message: Server refused deleting file (error code
> 1)
> > > > [Nested exception message:  Custom message: Unexpected reply:
> 500-Comm
> > >
> > > and
> > >
> > > > failed : System error in unlink: No such file or directory
> > > > 500-A system call failed: No such file or directory
> > > > 500 End.].  Nested exception is
> > > > org.globus.ftp.exception.UnexpectedReplyCodeException:  Custom
> message:
> > > > Unexpected reply: 500-Command failed : System error in unlink: No
> such
> > >
> > > file
> > >
> > > > or directory
> > > > 500-A system call failed: No such file or directory
> > > > 500 End.
> > > >         at org.globus.ftp.vanilla.FTPControlChannel.execute (
> > > > FTPControlChannel.java:328)
> > > >         at org.globus.ftp.FTPClient.deleteFile(FTPClient.java:253)
> > > >         at
> org.globus.transfer.reliable.service.DeleteClient.delete(
> > > > DeleteClient.java:189)
> > > >         at org.globus.transfer.reliable.service.TransferWork.run(
> > > > TransferWork.java:688)
> > > >         at
> org.globus.wsrf.impl.work.WorkManagerImpl$WorkWrapper.run (
> > > > WorkManagerImpl.java:355)
> > > >         at java.lang.Thread.run(Thread.java:595)
> > > > 2007-07-23 11:10:58,628 ERROR
> > > > service.TransferWork[WorkThread-40,run:724] Terminal transfer
> error:
> > > > Error deleting a file
> > > >  "/opt/globus/523.err" [Caused by: Server refused performing the
> > >
> > > request.
> > >
> > > > Custom message: Server refused deleting file (error code 1)
> [Nested
> > > > exception message:  Custom message: Unexpected reply: 500-Command
> fai
> > >
> > > led :
> > > > System error in unlink: No such file or directory
> > > > 500-A system call failed: No such file or directory
> > > > 500 End.]]
> > > > Error deleting a file
> > > >  "/opt/globus/523.err"
> > >
> > > It seems that you haven't a $GLOBUS_USER_HOME, this would explain
> some
> > > problems. In the rsl file you have:
> > >
> > > ${GLOBUS_USER_HOME}/523.err 73 as stderr, and I don't believe that
> > > $GLOBUS_USER_HOME is /opt/globus for a simple grid user... Do you
> see any
> > > of
> > > the poststage datasets, if you poststage them?
> >
> >  Actually, i'm mapped to the remote globus user so that points to
> > /opt/globus/ . And as i said, 523.sh appears in /opt/globus some
> staging
> > seems to work...
> >
> >  What do you mean by poststage datasets?
> >
>
> At the end of my rsl dataset I've a passage as:
>
> #############################cut here######################
>         <fileStageOut>
>                 <!-- stage out stdout -->
>                 <transfer>
>
> <sourceUrl>file:///${GLOBUS_USER_HOME}/523.out</sourceUrl>
>
> 
<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/523.out</destinationUrl>
>
>                 </transfer>
>                 <!-- stage out stderr -->
>                 <transfer>
>
> <sourceUrl>file:///${GLOBUS_USER_HOME}/523.err</sourceUrl>
>
> 
<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/523.err</destinationUrl>
>
>                 </transfer>
>                 <!-- stage out log -->
>                 <transfer>
>
> <sourceUrl>file:///${GLOBUS_USER_HOME}/GEO600/tasks/523.log</sourceUrl>
>
> 
<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/523.log</destinationUrl>
>
>                 </transfer>
>                 <!-- stage out task results -->
>                 <transfer>
>
> <sourceUrl>file:///${GLOBUS_USER_HOME}/GEO600/tasks/523.tar</sourceUrl>
>
>
> 
<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/523.tar</destinationUrl>
>                 </transfer>
>         </fileStageOut>
> #############################cut here######################
>
> Here the postaging or filestageout is performed. The other section
> (cleanup)
> describes, which datasets should be deleted.
>
> Okay, if you entering to SGE, you are getting another environment. Your
> output
> files will be probabaly located on worker node. I believe that SGE can't
> connect to /opt/globus. So the ouptut files won't be written! Contact
> your
> system administrator, ask him, which filesystem directories can be
> accessed
> by SGE jobs...
>
> >  Is there any logfile or something else i can do to have some verbose
> > debugging informations? Where is the "submit script" that fails?
> >
> >
>
> Option "-debug" at the globusrun-ws call
>
> Cheers
>
> Alexander
>



Reply via email to