On 7/27/07, Charles Bacon <[EMAIL PROTECTED]> wrote:
>
> As the SGE module isn't ours, I don't have any reason why it would be
> setting the jobtype to multiple here.  If I were you, I would just go into
> the sge.pm file and make it so it didn't set my jobtype to multiple unless
> I asked it to.  :-)
>

 Hehe ok. So, you mean that, in my SGE case, all the perl description (thus,
"jobtype" in particular) is set in the LESC packages and not in yours ?
 Or the problem could be in "your" code?

 Cheers,
 Francois.


Charles
>
> On Jul 27, 2007, at 11:09 AM, Francois Hornoy wrote:
>
>
>  One more mail to give some news. :)
>
>  I did what Stuart said: i got the Perl description. I put it in a file,
> and then launched manually the command.
>
>  Here is the Perl description:
>
> $description = {
>     directory => [ '/home/fhornoy' ],
>     condoros => [ 'LINUX' ],
>     condorarch => [ 'INTEL' ],
>     stderr => [ '/dev/null' ],
>     environment => [ [ 'GLOBUS_LOCATION', '/opt/globus' ], [
> 'X509_CERT_DIR', '/etc/grid-security/certificates' ], [ 'X509_USER_PROXY',
> '' ], [ 'X509_USER_CERT', '' ], [ 'X509_USER_KEY', '' ], [ 'HOME', '/home/f\
>
> hornoy' ], [ 'LOGNAME', 'fhornoy' ], [ 'SCRATCH_DIRECTORY',
> '/home/fhornoy/.globus/scratch' ], [ 'JAVA_HOME',
> '/usr/java/jdk1.5.0_07/jre' ], [ 'GLOBUS_GRAM_JOB_HANDLE', '
> https://193.48.145.106:8443/wsrf/services\<https://193.48.145.106:8443/wsrf/services%5C>
> /ManagedExecutableJobService?1d9988b8-3c20-11dc-908d-0017f23158ca' ],  ],
>     xmlextensions => [ '1' ],
>     executable => [ '/bin/hostname' ],
>     factoryendpoint => [ 'Address: 
> https://193.48.145.106:8443/wsrf/services/ManagedJobFactoryService
>
> Reference property[0]:
> <ns5:ResourceID ns04:type="ns05:string" xmlns:ns04="
> http://www.w3.org/2001/XMLSchema-instance"; xmlns:ns05="
> http://www.w3.org/2001/XMLSchema"; xmlns:ns5="
> http://www.globus.org/namespaces/2004/10/gram/job";>SGE</ns5\
> :ResourceID>
> ' ],
>     stdin => [ '/dev/null' ],
>     jobdir => [
> '/home/fhornoy/.globus/1d9988b8-3c20-11dc-908d-0017f23158ca' ],
>     jobtype => [ 'multiple' ],
>     stdout => [ '/dev/null' ],
>     count => [ '1' ],
> };
>
>
>  Here is the command i launch:
>  /usr/bin/sudo -H -u fhornoy -S
> /opt/globus/libexec/globus-gridmap-and-execute -g
> /etc/grid-security/grid-mapfile /opt/globus/libexec/globus-
> job-manager-script.pl -m sge -f description.txt -c submit
>
>  And it fails on GRAM_ERROR:24. But (remember my previous mails) it's
> normal because in the perl description, "jobtyp" is set to "multiple" and it
> fails because my "PE environment is not set".
>
>  So in the perl description, i change "multiple" to "single" and it works
> fine.
>
>  So: is it normal that a "-c /bin/hostname" is interpreted as a "multiple"
> job? If yes, how to set up my PE environment please. If no, i'll try to find
> what's going wrong. :)
>
>  Cheers,
>  Francois.
>
>
>
> On 7/26/07, Francois Hornoy <[EMAIL PROTECTED]> wrote:
> >
> >
> >   Hi,
> >
> >  As said in a previous mail, i've identified the piece of perl code that
> > fails. In the debugging informations from the container, i got the line that
> > is execute:
> >
> > /usr/bin/sudo -H -u fhornoy -S
> > /opt/globus/libexec/globus-gridmap-and-execute -g
> > /etc/grid-security/grid-mapfile /opt/globus/libexec/globus-
> > job-manager-script.pl -m sge -f /opt/globus/tmp/gram_job_mgr41857.tmp -c
> > submit
> >
> >  And perl logs contain (for the relevant part, pasted at the end of
> > mail):
> > Determining job type
> >  Job is of type multiple
> > ERROR: Parallel Environment (PE) failure!
> > MPI/multiple job was submitted, but no PE set  by neither user nor
> > administrator
> > GRAM_SCRIPT_ERROR:24
> >
> > So why a "-c /bin/hostname" is interpreted as a multiple job. I mean: is
> > this normal?
> > And if it is, how to "set that PE" ?
> >
> >
> > The piece of code: about line 422 of
> > /opt/globus/lib/perl/Globus/GRAM/JobManager/sge.pm :
> >
> >     #####
> >     # Determining job request type.
> >     #
> >     print("Determining job type");
> >     print("  Job is of type " . $description->jobtype());
> >     if($description->jobtype() eq "mpi" ||
> >        $description->jobtype() eq "multiple")
> >     {
> >         #####
> >         # Check if RSL attribute parallel_environment is provided
> >         #
> >         if($description->parallel_environment())
> >         {
> >             $mpi_pe = $description->parallel_environment();
> >         }
> >
> >         if(!$mpi_pe || $mpi_pe eq "NONE"){
> >           print("ERROR: Parallel Environment (PE) failure!");
> >             print("  MPI/multiple job was submitted, but no PE set");
> >             print("  by neither user nor administrator");
> >             return Globus::GRAM::Error::INVALID_SCRIPT_REPLY;
> >         }
> >         else
> >         {
> >             print("  PE is $mpi_pe");
> >             $sge_job_script->print("#\$ -pe $mpi_pe "
> >                                    . $description->count() . "\n");
> >         }
> >
> >
> >  Cheers,
> >  Francois.
> >
> >
> > On 7/26/07, Francois Hornoy <[EMAIL PROTECTED]> wrote:
> > >
> > >
> > >  Hi,
> > >
> > > On 7/25/07, Stuart Martin <[EMAIL PROTECTED]> wrote:
> > > >
> > > > You can use the GRAM2 error codes to look up the error.
> > > > http://www-unix.globus.org/toolkit/docs/4.0/execution/prewsgram/user-
> > > >
> > > > index.html#s-gram-user-errorcodes
> > > >
> > > > 24      GLOBUS_GRAM_PROTOCOL_ERROR_INVALID_SCRIPT_REPLY the job
> > > > manager
> > > > detected an invalid script response
> > > >
> > > > There is some doc for this problem here:
> > > > http://www-unix.globus.org/toolkit/docs/4.0/execution/wsgram/user-
> > > > index.html#s-wsgram-user-troubleshooting
> > > > Find the heading for "The job manager detected an invalid script
> > > > response"
> > >
> > >
> > > Yeah i already read that before posting my mail. I tried putting a
> > > umask of 0000 for the local user we are mapped to. That did not solve my
> > > problem (or maybe that 0000 umask thing is wrong?).
> > >
> > >
> > > If there is nothing obvious, then there is a section here on
> > > > debugging script executions, where you save out the contents of the
> > > > perl job description used by the scripts and run the perl submission
> > > > command by hand:
> > > >
> > > > http://www-unix.globus.org/toolkit/docs/4.0/execution/wsgram/
> > > > developer-index.html#id2565465
> > >
> > >
> > >
> > >  Ok, i will try that as soon as the globus website is working fine.
> > > Thanks.
> > >
> > >  Francois.
> > >
> > >
> > > -Stu
> > > >
> > > > On Jul 25, 2007, at Jul 25, 6:27 AM, Francois Hornoy wrote:
> > > >
> > > > >
> > > > >   Ok. I have some debugging informations now. So i repeat the
> > > > context.
> > > > >
> > > > >  Here is the basic command: globusrun-ws -submit -F https://MyIP:
> > > > > 8443/wsrf/services/ManagedJobFactoryService -c /bin/hostname
> > > > >
> > > > >  Works fine. If i add the "-Ft SGE" option, i get an error.
> > > > >
> > > > >  On the client side (output of that command above) is:
> > > > >
> > > > > [EMAIL PROTECTED]:~$ globusrun-ws -submit -F https://MyIP:8443/wsrf/
> > > > > services/ManagedJobFactoryService -Ft SGE -c /bin/hostname
> > > > > Submitting job...Done.
> > > > > Job ID: uuid:9007c47c-3aa0-11dc-86fc-0017f23158ca
> > > > > Termination time: 07/26/2007 11:16 GMT
> > > > > Current job state: Failed
> > > > > Destroying job...Done.
> > > > > globusrun-ws: Job failed: Internal fault occurred while running
> > > > the
> > > > > submit script.
> > > > > [EMAIL PROTECTED]:~$
> > > > >
> > > > >  On the server side, i attached the container output. It's a bit
> > > > > long, i think the relevant lines are here (around line 828 in the
> > > > > attached file):
> > > > >
> > > > > 2007-07-25 13:17:01,862 DEBUG exec.JobManagerScript[Thread-18,run:
> > > > > 208] Executing command:
> > > > > /usr/bin/sudo -H -u fhornoy -S /opt/globus/libexec/globus-gridmap-
> > > >
> > > > > and-execute -g /etc/grid-security/grid-mapfile
> > > > /opt/globus/libexec/
> > > > > globus- job-manager-script.pl -m sge -f /opt/globus/tmp/
> > > > > gram_job_mgr53582.tmp -\
> > > > > c submit
> > > > > 2007-07-25 13:17:02,056 DEBUG exec.JobManagerScript[Thread-18,run:
> > > > > 225] first line: GRAM_SCRIPT_ERROR:24
> > > > > 2007-07-25 13:17:02,057 DEBUG exec.JobManagerScript[Thread-18,run:
> > > > > 228] Read line: GRAM_SCRIPT_ERROR:24
> > > > > 2007-07-25 13:17:02,059 DEBUG exec.JobManagerScript[Thread-18,run:
> > > > > 335] failure message: null
> > > > > 2007-07-25 13:17:02,059 DEBUG exec.JobManagerScript
> > > > > [Thread-18,setDone:345] script is done, setting done flag
> > > > > 2007-07-25 13:17:02,060 DEBUG exec.StateMachine
> > > > > [RunQueueThread_0,processSubmitState:1105] Done waiting for submit
> > > > > script
> > > > > 2007-07-25 13:17:02,060 DEBUG exec.StateMachine
> > > > > [RunQueueThread_0,processSubmitState:1129] script return code: 24
> > > > > 2007-07-25 13:17:02,060 DEBUG exec.StateMachine
> > > > > [RunQueueThread_0,processSubmitState:1134] script return code
> > > > means
> > > > > error!
> > > > > 2007-07-25 13:17:02,060 DEBUG exec.StateMachine
> > > > > [RunQueueThread_0,createFaultFromErrorCode:3027] Creating fault
> > > > > from error code 24
> > > > > 2007-07-25 13:17:02,066 DEBUG utils.FaultUtils
> > > > > [RunQueueThread_0,makeFault:460] Fault Class: class
> > > > > org.globus.exec.generated.InternalFaultType
> > > > > 2007-07-25 13:17:02,066 DEBUG utils.FaultUtils
> > > > > [RunQueueThread_0,makeFault:461] Resource Key: {http://
> > > > > www.globus.org/namespaces/2004/10/gram/job}
> > > > > ResourceID=9007c47c-3aa0-11dc-86fc-0017f23158ca
> > > > > 2007-07-25 13:17:02,066 DEBUG utils.FaultUtils
> > > > > [RunQueueThread_0,makeFault:462] Description: Internal fault
> > > > > occurred while running the submit script.
> > > > > 2007-07-25 13:17:02,067 DEBUG utils.FaultUtils
> > > > > [RunQueueThread_0,makeFault:463] Cause: null
> > > > > 2007-07-25 13:17:02,067 DEBUG utils.FaultUtils
> > > > > [RunQueueThread_0,makeFault:464] State when failure occurred
> > > > > Unsubmitted
> > > > > 2007-07-25 13:17:02,067 DEBUG utils.FaultUtils
> > > > > [RunQueueThread_0,makeFault:466] Script Command: submit
> > > > > 2007-07-25 13:17:02,067 DEBUG utils.FaultUtils
> > > > > [RunQueueThread_0,makeFault:467] GT2 Error Code: 24
> > > > > 2007-07-25 13:17:02,072 DEBUG utils.FaultUtils
> > > > > [RunQueueThread_0,makeFault:519] Script Command: submit
> > > > > 2007-07-25 13:17:02,072 DEBUG ManagedJobResourceImpl.
> > > > > 9007c47c-3aa0-11dc-86fc-0017f23158ca [RunQueueThread_0,setFault:
> > > > > 346] fault element name: InternalFaultType
> > > > > 2007-07-25 13:17:02,072 DEBUG ManagedJobResourceImpl.
> > > > > 9007c47c-3aa0-11dc-86fc-0017f23158ca [RunQueueThread_0,setFault:
> > > > > 350] fault element name: InternalFault
> > > > > 2007-07-25 13:17:02,072 DEBUG ManagedJobResourceImpl.
> > > > > 9007c47c-3aa0-11dc-86fc-0017f23158ca [RunQueueThread_0,setFault:
> > > > > 353] fault element name: internalFault
> > > > >
> > > > >
> > > > >  So, i did not find much on google about error 24, and it's not
> > > > > very explicit.
> > > > >
> > > > >
> > > > >  Cheers,
> > > > >  Francois.
> > > > >
> > > > >
> > > > > On 7/23/07, alexander.beck-ratzka <alexander.beck-
> > > > > [EMAIL PROTECTED]> wrote: On Monday 23 July 2007 14:19, Francois
> > > > > Hornoy wrote:
> > > > > > On 7/23/07, alexander.beck-ratzka <alexander.beck-
> > > > > [EMAIL PROTECTED] > wrote:
> > > > > > > On Monday 23 July 2007 11:18, Francois Hornoy wrote:
> > > > > > > >  Hi Alexander,
> > > > > > > >
> > > > > > > >  I tried a simple example based on yours. It just stageIn a
> > > > > file,
> > > > > > > > "/bin/cat" it (that's the job), and i stageOut the .out
> > > > > and .err files.
> > > > > > > >
> > > > > > > >  I keep having this error (see end of mail).
> > > > > > > >
> > > > > > > >  If i watch globusrun-ws -status -j job.id nad if i watch
> > > > the
> > > > > > > > "Execution Host", i can see that the StageIn step is good,
> > > > > the file
> > > > > > > > 523.sh is well transferred. But then, it crashes.
> > > > > > > >
> > > > > > > >  Of course, if i do exactly the same thing without "-Ft
> > > > SGE",
> > > > > it works
> > > > > > > > perfectly.
> > > > > > > >
> > > > > > > >   Cheers,
> > > > > > > >   Francois.
> > > > > > > >
> > > > > > > > 2007-07-23 11:10:53,427 INFO
> > > > > > > > exec.StateMachine[RunQueueThread_5,logJobAccepted:3193] Job
> > > > > > > > 9cd21054-38fc-11dc-b055-0017f23158ca accepted for local user
> > > >
> > > > > 'globus'
> > > > > > > > 2007-07-23 11:10:58,370 ERROR
> > > > > > > > service.TransferWork[WorkThread-39,run:724] Terminal
> > > > transfer
> > > > > error:
> > > > > > > > Error deleting a file
> > > > > > > >  "/opt/globus/523.out" [Caused by: Server refused performing
> > > > the
> > > > > > >
> > > > > > > request.
> > > > > > >
> > > > > > > > Custom message: Server refused deleting file (error code 1)
> > > > > [Nested
> > > > > > > > exception message:  Custom message: Unexpected reply: 500-
> > > > > Command fai
> > > > > > >
> > > > > > > led :
> > > > > > > > System error in unlink: No such file or directory
> > > > > > > > 500-A system call failed: No such file or directory
> > > > > > > > 500 End.]]
> > > > > > > > Error deleting a file
> > > > > > > >  "/opt/globus/523.out"
> > > > > > > > . Caused by
> > > > > > > > org.globus.ftp.exception.ServerException: Server refused
> > > > > performing the
> > > > > > > > request. Custom message: Server refused deleting file (error
> > > > > code 1)
> > > > > > > > [Nested exception message:  Custom message: Unexpected
> > > > reply:
> > > > > 500-Comm
> > > > > > >
> > > > > > > and
> > > > > > >
> > > > > > > > failed : System error in unlink: No such file or directory
> > > > > > > > 500-A system call failed: No such file or directory
> > > > > > > > 500 End.].  Nested exception is
> > > > > > > > org.globus.ftp.exception.UnexpectedReplyCodeException:
> > > > > Custom message:
> > > > > > > > Unexpected reply: 500-Command failed : System error in
> > > > > unlink: No such
> > > > > > >
> > > > > > > file
> > > > > > >
> > > > > > > > or directory
> > > > > > > > 500-A system call failed: No such file or directory
> > > > > > > > 500 End.
> > > > > > > >         at org.globus.ftp.vanilla.FTPControlChannel.execute(
> > > > > > > > FTPControlChannel.java:328)
> > > > > > > >         at org.globus.ftp.FTPClient.deleteFile(
> > > > FTPClient.java :
> > > > > 253)
> > > > > > > >         at
> > > > > org.globus.transfer.reliable.service.DeleteClient.delete(
> > > > > > > > DeleteClient.java:189)
> > > > > > > >         at
> > > > > org.globus.transfer.reliable.service.TransferWork.run (
> > > > > > > > TransferWork.java:688)
> > > > > > > >         at org.globus.wsrf.impl.work.WorkManagerImpl
> > > > > $WorkWrapper.run (
> > > > > > > > WorkManagerImpl.java:355)
> > > > > > > >         at java.lang.Thread.run(Thread.java:595)
> > > > > > > > 2007-07-23 11:10:58,628 ERROR
> > > > > > > > service.TransferWork[WorkThread-40,run:724] Terminal
> > > > transfer
> > > > > error:
> > > > > > > > Error deleting a file
> > > > > > > >  "/opt/globus/523.err" [Caused by: Server refused performing
> > > > the
> > > > > > >
> > > > > > > request.
> > > > > > >
> > > > > > > > Custom message: Server refused deleting file (error code 1)
> > > > > [Nested
> > > > > > > > exception message:  Custom message: Unexpected reply: 500-
> > > > > Command fai
> > > > > > >
> > > > > > > led :
> > > > > > > > System error in unlink: No such file or directory
> > > > > > > > 500-A system call failed: No such file or directory
> > > > > > > > 500 End.]]
> > > > > > > > Error deleting a file
> > > > > > > >  "/opt/globus/523.err"
> > > > > > >
> > > > > > > It seems that you haven't a $GLOBUS_USER_HOME, this would
> > > > > explain some
> > > > > > > problems. In the rsl file you have:
> > > > > > >
> > > > > > > ${GLOBUS_USER_HOME}/523.err 73 as stderr, and I don't believe
> > > > that
> > > > > > > $GLOBUS_USER_HOME is /opt/globus for a simple grid user... Do
> > > > > you see any
> > > > > > > of
> > > > > > > the poststage datasets, if you poststage them?
> > > > > >
> > > > > >  Actually, i'm mapped to the remote globus user so that points
> > > > to
> > > > > > /opt/globus/ . And as i said, 523.sh appears in /opt/globus some
> > > > > staging
> > > > > > seems to work...
> > > > > >
> > > > > >  What do you mean by poststage datasets?
> > > > > >
> > > > >
> > > > > At the end of my rsl dataset I've a passage as:
> > > > >
> > > > > #############################cut here######################
> > > > >         <fileStageOut>
> > > > >                 <!-- stage out stdout -->
> > > > >                 <transfer>
> > > > >
> > > > > <sourceUrl>file:///${GLOBUS_USER_HOME}/523.out</sourceUrl>
> > > > >
> > > > > <destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/
> > > > > 523.out</destinationUrl>
> > > > >                 </transfer>
> > > > >                 <!-- stage out stderr -->
> > > > >                 <transfer>
> > > > >
> > > > > <sourceUrl>file:///${GLOBUS_USER_HOME}/523.err</sourceUrl>
> > > > >
> > > > > <destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/
> > > > > 523.err</destinationUrl>
> > > > >                 </transfer>
> > > > >                 <!-- stage out log -->
> > > > >                 <transfer>
> > > > >
> > > > > <sourceUrl>file:///${GLOBUS_USER_HOME}/GEO600/tasks/523.log</
> > > > > sourceUrl>
> > > > >
> > > > > <destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/
> > > > > 523.log</destinationUrl>
> > > > >                 </transfer>
> > > > >                 <!-- stage out task results -->
> > > > >                 <transfer>
> > > > >
> > > > > <sourceUrl>file:///${GLOBUS_USER_HOME}/GEO600/tasks/523.tar</
> > > > > sourceUrl>
> > > > >
> > > > > <destinationUrl>gsiftp://globus.submit.host/store/GEO600/tasks/
> > > > > 523.tar</destinationUrl>
> > > > >                 </transfer>
> > > > >         </fileStageOut>
> > > > > #############################cut here######################
> > > > >
> > > > > Here the postaging or filestageout is performed. The other section
> > > >
> > > > > (cleanup)
> > > > > describes, which datasets should be deleted.
> > > > >
> > > > > Okay, if you entering to SGE, you are getting another environment.
> > > > > Your output
> > > > > files will be probabaly located on worker node. I believe that SGE
> > > >
> > > > > can't
> > > > > connect to /opt/globus. So the ouptut files won't be written!
> > > > > Contact your
> > > > > system administrator, ask him, which filesystem directories can be
> > > > > accessed
> > > > > by SGE jobs...
> > > > >
> > > > > >  Is there any logfile or something else i can do to have some
> > > > > verbose
> > > > > > debugging informations? Where is the "submit script" that fails?
> > > > > >
> > > > > >
> > > > >
> > > > > Option "-debug" at the globusrun-ws call
> > > > >
> > > > > Cheers
> > > > >
> > > > > Alexander
> > > > >
> > > > > <output.log>
> > > >
> > > >
> > >
> >
>
>

Reply via email to