Re: [gt-user] 2 Questions about MPICH-G2

Nicholas Karonis Thu, 19 Mar 2009 17:52:54 -0700


On Mar 18, 2009, at 9:56 PM, tracy_luofengji wrote:

Dear all,
After read the paper of Nicholas.Karonis "MPICH-G2: A Grid-EnabledImplementation of the Message Passing Interface" and the MPICH-G2'sweb page, I want to ask 2 questions in order to understand MPICH-G2better.
1. As the paper and the web page said, MPICH-G2 used the vendor-supplied MPI implemention to perform the intra-communications, andthe vendor-supplied MPI implemention means the MPI implemention thatalready exists on the cluster and which is not MPICH-based. But whatshould I do if my cluster has already installed MPICH? In this case,how does MPICH-G2 perform the intra-communications? And if I havealready configured my cluster with normal MPICH, should I remove theinstallation of MPICH and re-install the MPICH-G2 on the head node?


The early versions of MPICH-G2 could not be configured with
an MPI flavor of the GT library that was, in turn, built with
an MPICH-based MPI.  However, as of MPICH-G2 v1.2.5.1 (which was
probably released after the article was published) that restriction
was removed.

So, you should be able to take the MPICH-based vendor-MPI on your
cluster, use it to build an MPI flavor of the Globus libraries,
and then use that MPI flavor of the Globus libraries to configure
and build MPICH-G2.  In this setting:
(1) the Globus Job Manager script that runs on that cluster
    will have to be modified to use the 'mpirun' that comes
    with the vendor-supplied MPI (note, NOT MPICH-G2's mpirun)
    when the subjob in the RSL to run on that cluster specifies
    (jobtype=mpi),
(2) all RSL subjobs that run on that cluster should specify
    (jobtype=mpi), and
(3) when doing (1)+(2) above all intra-cluster messages will
    be done over the vendor-supplied MPI.

Note also, if you don't have a vendor-supplied MPI on your cluster
or if you don't want to use the one that's there, you can always
build a non-MPI flavor of the Globus libraries, configure and
build MPICH-G2 atop those Globus libraries, and run on the
cluster that way.  In that case you do not need to modify
the Globus Job Manager, you do not specify (jobtype=mpi)
in your RSL subjob for that cluster, and all intra-cluster
messaging will be done via TCP/IP.

2.Nick said the mpich-g2 works based on the infrastructure wealready have that controls access to our cluster. I understand it,but I am still a little confused about how Globus submit mpi jobs tolocal schedular (such as PBS). When my cluster is managed by PBS andhave normal MPICH installed,I submit mpi jobs to PBS using somethinglike "mpriun -machinefile $PBS_NODEFILE -np 10 app". Now I removethe normal MPICH and install MPICH-G2, then the machinefile used byMPICH-G2 is no longer the nodes of my cluster, but the address ofcompute resources in the grid. In this case, how does the PBSjobmanager sumit mpi jobs to PBS ?


Yes, this can be confusing.  It all works by modifying the Globus
PBS Job Manager script to detect when (jobtype=mpi) is in the
Globus RSL subjob, and when it is, to call the vendor-supplied
MPICH mpirun (as all described above).  It is also important for
MPICH-G2 that the env vars - those specified in the RSL and those
in the user's environment (e.g., .cshrc) all get propagated to
the running app.  This too often requires some hacking to the
Globus Job manager script.

There are folks in the Globus community (developers and users) that
know how to hack the Job Manager scripts as described above.
They might be willing to share their hacks ;-).  I think, for
example, certain TeraGrid sites might have these hacks in place
for PBS (you might try [email protected]).

Nick


Any help will be appraciated!
Thanks,
Tracy





网易邮箱，中国第一大电子邮件服务商

Re: [gt-user] 2 Questions about MPICH-G2

Reply via email to