Excellent! Thanks Wilfred and Francois for clearing that up. It seems a good time to mention the recent LRMA adapter (incubator) project http://dev.globus.org/wiki/Incubator/LRMA. JP and I have only recently setup the project, but we plan to add support for a larger set of GRAM adapters like SGE, Loadlevler, ... It's here that we'd add documentation for SGE and other adapters. This will only work (scale to many LRMs) if we get community support from those that know about and use these LRMs.

Thanks,
Stu

On Jul 27, 2007, at Jul 27, 11:05 AM, Francois Hornoy wrote:


 OK, thanks for both explanations.

Actually, after having installed the packages of LESC, we have to run: gpt-postinstall.

 And then we can run:
   cd $GLOBUS_LOCATION/setup/globus/
  setup-globus-job-manager-sge --mpi-pe=XXXXXX

Where XXXX is one of the available PE returned by "qconf -spl". So when we then edit sge.pm, we see the variable $mpi_pe='XXXXX'; . And then, all works fine.

 Thanks,
 Francois.




On 7/27/07, Wilfred Li <[EMAIL PROTECTED]> wrote: Hi,

Stuart is right, the original error message returned by SGE indicates
that the appropriate parallel environment wasn't set up for MPI.

#to see what PE are available:
#qconf -spl

Check your script and see what PE (-pe parameter) you are request, you
can easily modify from one of the existing ones.

#to see the details of the "mpi" PE:
#qconf -sp mpi

#to modify a PE
#qconf -mp mpi

Please see the man pages of qconf for other details.

Regards,

Wilfred


        -----Original Message-----
        From: [EMAIL PROTECTED]
        [mailto: [EMAIL PROTECTED] On Behalf Of Stuart Martin
        Sent: Friday, July 27, 2007 8:15 AM
        To: charles bacon; Francois Hornoy
        Cc: globus user; R. Jeff Porter
        Subject: Re: [gt-user] GT & SGE

        Charles: I think your confusing "multi-jobs (MMJS) Vs
        an individual job (MEJS)" and then for an MEJS job
        "jobtype=multiple Vs jobtype=single".

        Calling all SGE folks:

        The default jobtype for an MEJS job is "multiple",
        meaning that when you submit a job with count 4, 4
        nodes/cpus will be allocated and 4 copies of the
        executable will be started (one for each node/cpu).  In
        PBS we have some code to ssh/rsh to each node to start
        the job.  In SGE, I am not sure how it is done.  Seems
        there is some confusion in the SGE script with what to
        do with jobtype multiple.  If the SGE script is
        dependent on the SGE PE environment to be setup in
        order to process jobtype multiple, then that is a
        current dependency and it needs to be setup.  Simple as
        that.  I don't know if the PE environment comes default
        with certain versions of SGE or how it is setup.  Are
        there others that can shed some light on how this is
        done in SGE?  Is the PE environment typically setup?
        Is it easy to do?

        An alternative to being dependent on the PE environment
        would be to process jobtype=multiple jobs without it.
        For example, maybe something similar to the PBS ssh/rsh
        code can be written to start the application processes
        on each allocated node?

        Q: Is the PE environment required for processing GRAM
        jobtype=MPI jobs?  Sounds like it would be, so this
        would indicate that the PE environment should always be
        setup for a GRAM installation.  If so, then this
        dependency just needs to be made more explicit and documented.

        -Stu

        On Jul 27, 2007, at Jul 27, 7:41 AM, Charles Bacon wrote:

        > The globus-job-manager creates it, I believe.  What I
        was suggesting,
        > though, was to just add a line to the sge.pm that did
        something like:
        >
        > if ( $jobtype == "multiple") && ( $count == 1 ) {
        $jobtype = "single";
        > }
        >
        > I am confused about why the jobtype is coming in as
        multiple in the
        > description, though.  As far as I know, this should
        be coming in as
        > single when you submit something like -c
        /bin/hostname.  Maybe Martin
        > or Stu can comment on that.
        >
        >
        > Charles
        >
        > On Jul 27, 2007, at 12:29 PM, Francois Hornoy wrote:
        >
        >>
        >>  Hum ok, thank you.
        >>
        >>  It seems that the default jobtype is "multiple", as
        we can see in
        >> the file:
        >> include/gcc64dbg/globus_gram_protocol.h, line 328:
        >> #define GLOBUS_GRAM_PROTOCOL_DEFAULT_JOBTYPE
        >> "multiple"
        >>
        >>  I've tried to "grep" in the sources of Globus and
        LESC packages, and
        >> did not fine that
        GLOBUS_GRAM_PROTOCOL_DEFAULT_JOBTYPE. So maybe they
        >> did not put anything, and by default, it's set to
        "multiple". I don't
        >> know.
        >>
        >>  So, who generates that perl $description? "grep"
        did not help me
>> much. I understand that the sge.pm reads this file, but who
        >> generates it?
        >>
        >>  Thanks for helping,
        >>  Francois.
        >>
        >>
        >> On 7/27/07, Charles Bacon < [EMAIL PROTECTED]> wrote:
        >> On Jul 27, 2007, at 11:49 AM, Francois Hornoy wrote:
        >>
        >>> On 7/27/07, Charles Bacon < [EMAIL PROTECTED]> wrote:
        >>> As the SGE module isn't ours, I don't have any
        reason why it would
        >>> be setting the jobtype to multiple here.  If I were
        you, I would
        >>> just go into the sge.pm file and make it so it
        didn't set my jobtype
        >>> to multiple unless I asked it to.  :-)
        >>>
        >>>  Hehe ok. So, you mean that, in my SGE case, all the perl
        >>> description (thus, "jobtype" in particular) is set
        in the LESC
        >>> packages and not in yours ?
        >>
        >> That's what I'm thinking.  I send /bin/hostname jobs
        to fork and pbs
        >> adapters, and don't hit a jobtype of multiple.  I
        know that SGE in
        >> particular has a jobarray type that some SGE
        adapters call multiple,
        >> and others don't.  This is one of the reasons there
        is more than one
        >> SGE adapter, because people have made different
        decisions from each
        >> other.
        >>
        >>>  Or the problem could be in "your" code?
        >>
        >> It's definitely possible, but I find it unlikely as
        it stands.
        >>
        >>
        >> Charles
        >>
        >>
        >




Reply via email to