sge.pm attached.

 Cheers,
 Francois.


On 7/25/07, Francois Hornoy <[EMAIL PROTECTED]> wrote:


 Well. I've been taking a look in the file:
/opt/globus/lib/perl/Globus/GRAM/JobManager/sge.pm to search where was the
GRAM_SCRIPT_ERROR:24 raised.

 I found that it is on that piece of code ( sge.pm attached), line 422:

    #####
    # Determining job request type.
    #
    print("Determining job type");
    print("  Job is of type " . $description->jobtype());
    if($description->jobtype() eq "mpi" ||
       $description->jobtype() eq "multiple")
    {
        #####
        # Check if RSL attribute parallel_environment is provided
        #
        if($description->parallel_environment())
        {
            $mpi_pe = $description->parallel_environment();
        }

        if(!$mpi_pe || $mpi_pe eq "NONE"){
          print("ERROR: Parallel Environment (PE) failure!");
            print("  MPI/multiple job was submitted, but no PE set");
            print("  by neither user nor administrator");
            return Globus::GRAM::Error::INVALID_SCRIPT_REPLY;
        }
        else
        {
            print("  PE is $mpi_pe");
            $sge_job_script->print("#\$ -pe $mpi_pe "
                                   . $description->count() . "\n");
        }


 So, here is the output related to that piece of code:

Determining job type
 Job is of type multiple
ERROR: Parallel Environment (PE) failure!
MPI/multiple job was submitted, but no PE set  by neither user nor
administrator
GRAM_SCRIPT_ERROR:24

 So, 2 questions come in my mind:
   * it says that i submitted a multiple/mpi job. But "-c /bin/hostname"
is not, isn't it?
   * what must i do to set that "Parallele Environment" ?

 Cheers,
 Francois.



On 7/25/07, Francois Hornoy <[EMAIL PROTECTED]> wrote:
>
>
>   Ok. I have some debugging informations now. So i repeat the context.
>
>  Here is the basic command: globusrun-ws -submit -F 
https://MyIP:8443/wsrf/services/ManagedJobFactoryService
> -c /bin/hostname
>
>  Works fine. If i add the "-Ft SGE" option, i get an error.
>
>  On the client side (output of that command above) is:
>
> [EMAIL PROTECTED]:~$ globusrun-ws -submit -F
> https://MyIP:8443/wsrf/services/ManagedJobFactoryService -Ft SGE -c
> /bin/hostname
> Submitting job...Done.
> Job ID: uuid:9007c47c-3aa0-11dc-86fc-0017f23158ca
> Termination time: 07/26/2007 11:16 GMT
> Current job state: Failed
> Destroying job...Done.
> globusrun-ws: Job failed: Internal fault occurred while running the
> submit script.
> [EMAIL PROTECTED]:~$
>
>  On the server side, i attached the container output. It's a bit long, i
> think the relevant lines are here (around line 828 in the attached file):
>
> 2007-07-25 13:17:01,862 DEBUG exec.JobManagerScript [Thread-18,run:208]
> Executing command:
> /usr/bin/sudo -H -u fhornoy -S
> /opt/globus/libexec/globus-gridmap-and-execute -g
> /etc/grid-security/grid-mapfile /opt/globus/libexec/globus-
> job-manager-script.pl -m sge -f /opt/globus/tmp/gram_job_mgr53582.tmp -\
> c submit
> 2007-07-25 13:17:02,056 DEBUG exec.JobManagerScript [Thread-18,run:225]
> first line: GRAM_SCRIPT_ERROR:24
> 2007-07-25 13:17:02,057 DEBUG exec.JobManagerScript [Thread-18,run:228]
> Read line: GRAM_SCRIPT_ERROR:24
> 2007-07-25 13:17:02,059 DEBUG exec.JobManagerScript [Thread-18,run:335]
> failure message: null
> 2007-07-25 13:17:02,059 DEBUG exec.JobManagerScript[Thread-18,setDone:345] 
script is done, setting done flag
> 2007-07-25 13:17:02,060 DEBUG 
exec.StateMachine[RunQueueThread_0,processSubmitState:1105] Done waiting for 
submit script
> 2007-07-25 13:17:02,060 DEBUG 
exec.StateMachine[RunQueueThread_0,processSubmitState:1129] script return code: 24
> 2007-07-25 13:17:02,060 DEBUG 
exec.StateMachine[RunQueueThread_0,processSubmitState:1134] script return code 
means error!
> 2007-07-25 13:17:02,060 DEBUG 
exec.StateMachine[RunQueueThread_0,createFaultFromErrorCode:3027] Creating fault 
from error
> code 24
> 2007-07-25 13:17:02,066 DEBUG 
utils.FaultUtils[RunQueueThread_0,makeFault:460] Fault Class: class
> org.globus.exec.generated.InternalFaultType
> 2007-07-25 13:17:02,066 DEBUG 
utils.FaultUtils[RunQueueThread_0,makeFault:461] Resource Key: 
{http://www.globus.org/namespaces/2004/10/gram/job}ResourceID=9007c47c-3aa0-11dc-86fc-0017f23158ca
>
> 
<http://www.globus.org/namespaces/2004/10/gram/job%7DResourceID=9007c47c-3aa0-11dc-86fc-0017f23158ca>
> 2007-07-25 13:17:02,066 DEBUG 
utils.FaultUtils[RunQueueThread_0,makeFault:462] Description: Internal fault 
occurred while
> running the submit script.
> 2007-07-25 13:17:02,067 DEBUG 
utils.FaultUtils[RunQueueThread_0,makeFault:463] Cause: null
> 2007-07-25 13:17:02,067 DEBUG 
utils.FaultUtils[RunQueueThread_0,makeFault:464] State when failure occurred 
Unsubmitted
> 2007-07-25 13:17:02,067 DEBUG 
utils.FaultUtils[RunQueueThread_0,makeFault:466] Script Command: submit
> 2007-07-25 13:17:02,067 DEBUG 
utils.FaultUtils[RunQueueThread_0,makeFault:467] GT2 Error Code: 24
> 2007-07-25 13:17:02,072 DEBUG 
utils.FaultUtils[RunQueueThread_0,makeFault:519] Script Command: submit
> 2007-07-25 13:17:02,072 DEBUG
> 
ManagedJobResourceImpl.9007c47c-3aa0-11dc-86fc-0017f23158ca[RunQueueThread_0,setFault:346]
 fault element name: InternalFaultType
> 2007-07-25 13:17:02,072 DEBUG
> 
ManagedJobResourceImpl.9007c47c-3aa0-11dc-86fc-0017f23158ca[RunQueueThread_0,setFault:350]
 fault element name: InternalFault
> 2007-07-25 13:17:02,072 DEBUG
> 
ManagedJobResourceImpl.9007c47c-3aa0-11dc-86fc-0017f23158ca[RunQueueThread_0,setFault:353]
 fault element name: internalFault
>
>
>  So, i did not find much on google about error 24, and it's not very
> explicit.
>
>
>  Cheers,
>  Francois.
>
>
> On 7/23/07, alexander.beck-ratzka <[EMAIL PROTECTED]>
> wrote:
> >
> > On Monday 23 July 2007 14:19, Francois Hornoy wrote:
> > > On 7/23/07, alexander.beck-ratzka < [EMAIL PROTECTED]>
> > wrote:
> > > > On Monday 23 July 2007 11:18, Francois Hornoy wrote:
> > > > >  Hi Alexander,
> > > > >
> > > > >  I tried a simple example based on yours. It just stageIn a
> > file,
> > > > > "/bin/cat" it (that's the job), and i stageOut the .out and .err
> > files.
> > > > >
> > > > >  I keep having this error (see end of mail).
> > > > >
> > > > >  If i watch globusrun-ws -status -j job.id nad if i watch the
> > > > > "Execution Host", i can see that the StageIn step is good, the
> > file
> > > > > 523.sh is well transferred. But then, it crashes.
> > > > >
> > > > >  Of course, if i do exactly the same thing without "-Ft SGE", it
> > works
> > > > > perfectly.
> > > > >
> > > > >   Cheers,
> > > > >   Francois.
> > > > >
> > > > > 2007-07-23 11:10:53,427 INFO
> > > > > exec.StateMachine[RunQueueThread_5,logJobAccepted:3193] Job
> > > > > 9cd21054-38fc-11dc-b055-0017f23158ca accepted for local user
> > 'globus'
> > > > > 2007-07-23 11:10:58,370 ERROR
> > > > > service.TransferWork[WorkThread-39,run:724] Terminal transfer
> > error:
> > > > > Error deleting a file
> > > > >  "/opt/globus/523.out" [Caused by: Server refused performing the
> >
> > > >
> > > > request.
> > > >
> > > > > Custom message: Server refused deleting file (error code 1)
> > [Nested
> > > > > exception message:  Custom message: Unexpected reply:
> > 500-Command fai
> > > >
> > > > led :
> > > > > System error in unlink: No such file or directory
> > > > > 500-A system call failed: No such file or directory
> > > > > 500 End.]]
> > > > > Error deleting a file
> > > > >  "/opt/globus/523.out"
> > > > > . Caused by
> > > > > org.globus.ftp.exception.ServerException: Server refused
> > performing the
> > > > > request. Custom message: Server refused deleting file (error
> > code 1)
> > > > > [Nested exception message:  Custom message: Unexpected reply:
> > 500-Comm
> > > >
> > > > and
> > > >
> > > > > failed : System error in unlink: No such file or directory
> > > > > 500-A system call failed: No such file or directory
> > > > > 500 End.].  Nested exception is
> > > > > org.globus.ftp.exception.UnexpectedReplyCodeException:  Custom
> > message:
> > > > > Unexpected reply: 500-Command failed : System error in unlink:
> > No such
> > > >
> > > > file
> > > >
> > > > > or directory
> > > > > 500-A system call failed: No such file or directory
> > > > > 500 End.
> > > > >         at org.globus.ftp.vanilla.FTPControlChannel.execute (
> > > > > FTPControlChannel.java:328)
> > > > >         at org.globus.ftp.FTPClient.deleteFile(FTPClient.java
> > :253)
> > > > >         at
> > org.globus.transfer.reliable.service.DeleteClient.delete(
> > > > > DeleteClient.java:189)
> > > > >         at org.globus.transfer.reliable.service.TransferWork.run
> > (
> > > > > TransferWork.java:688)
> > > > >         at
> > org.globus.wsrf.impl.work.WorkManagerImpl$WorkWrapper.run (
> > > > > WorkManagerImpl.java:355)
> > > > >         at java.lang.Thread.run(Thread.java:595)
> > > > > 2007-07-23 11:10:58,628 ERROR
> > > > > service.TransferWork[WorkThread-40,run:724] Terminal transfer
> > error:
> > > > > Error deleting a file
> > > > >  "/opt/globus/523.err" [Caused by: Server refused performing the
> > > >
> > > > request.
> > > >
> > > > > Custom message: Server refused deleting file (error code 1)
> > [Nested
> > > > > exception message:  Custom message: Unexpected reply:
> > 500-Command fai
> > > >
> > > > led :
> > > > > System error in unlink: No such file or directory
> > > > > 500-A system call failed: No such file or directory
> > > > > 500 End.]]
> > > > > Error deleting a file
> > > > >  "/opt/globus/523.err"
> > > >
> > > > It seems that you haven't a $GLOBUS_USER_HOME, this would explain
> > some
> > > > problems. In the rsl file you have:
> > > >
> > > > ${GLOBUS_USER_HOME}/523.err 73 as stderr, and I don't believe that
> > > > $GLOBUS_USER_HOME is /opt/globus for a simple grid user... Do you
> > see any
> > > > of
> > > > the poststage datasets, if you poststage them?
> > >
> > >  Actually, i'm mapped to the remote globus user so that points to
> > > /opt/globus/ . And as i said, 523.sh appears in /opt/globus some
> > staging
> > > seems to work...
> > >
> > >  What do you mean by poststage datasets?
> > >
> >
> > At the end of my rsl dataset I've a passage as:
> >
> > #############################cut here######################
> >         <fileStageOut>
> >                 <!-- stage out stdout -->
> >                 <transfer>
> >
> > <sourceUrl>file:///${GLOBUS_USER_HOME}/523.out</sourceUrl>
> >
> > 
<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/523.out</destinationUrl>
> >
> >                 </transfer>
> >                 <!-- stage out stderr -->
> >                 <transfer>
> >
> > <sourceUrl>file:///${GLOBUS_USER_HOME}/523.err</sourceUrl>
> >
> > 
<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/523.err</destinationUrl>
> >
> >                 </transfer>
> >                 <!-- stage out log -->
> >                 <transfer>
> >
> >
> > <sourceUrl>file:///${GLOBUS_USER_HOME}/GEO600/tasks/523.log</sourceUrl>
> >
> > 
<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/523.log</destinationUrl>
> >
> >                 </transfer>
> >                 <!-- stage out task results -->
> >                 <transfer>
> >
> > <sourceUrl>file:///${GLOBUS_USER_HOME}/GEO600/tasks/523.tar</sourceUrl>
> >
> >
> >
> > 
<destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/523.tar</destinationUrl>
> >                 </transfer>
> >         </fileStageOut>
> > #############################cut here######################
> >
> > Here the postaging or filestageout is performed. The other section
> > (cleanup)
> > describes, which datasets should be deleted.
> >
> > Okay, if you entering to SGE, you are getting another environment.
> > Your output
> > files will be probabaly located on worker node. I believe that SGE
> > can't
> > connect to /opt/globus. So the ouptut files won't be written! Contact
> > your
> > system administrator, ask him, which filesystem directories can be
> > accessed
> > by SGE jobs...
> >
> > >  Is there any logfile or something else i can do to have some
> > verbose
> > > debugging informations? Where is the "submit script" that fails?
> > >
> > >
> >
> > Option "-debug" at the globusrun-ws call
> >
> > Cheers
> >
> > Alexander
> >
>
>
>

Attachment: sge.pm
Description: Perl program

Reply via email to