Re: [OMPI devel] btl tcp port to xensocket

Jeff Squyres Thu, 10 Jan 2008 21:02:47 -0500


On Jan 10, 2008, at 8:40 PM, Muhammad Atif wrote:

Hi,
Thanks for such a detailed reply. You are right, we have partitioned(normalized) our system with Xen and have seen that virtualizationoverhead is not that great (for some applications) as compared topotential benefits that we can get. We have executed variousbenchmarks on different network/cluster configuration of Xen andNative linux and they are really encouraging. The only known problemis inter-domain communication of Xen which is quite poor (1/6 of thenative memory transfer and not to mention 50%CPU utilization ofhost). We have tested out Xensocket, and these sockets give usreally good performance boost in all respects.Now that I am having a look at the complex yet wonderfularchitecture of openmpi, can you guys give me some guidance oncouple of naive questions?
1- How do I view the console output of an mpi process which is notat headnode? Do I have to have some parallel debugger? Or is thereany magical technique?

OMPI's run-time environment takes care of redirection stdout/stderrfrom each MPI process to the stdout/stderr of mpirun for you (this isanother use of the "out of band" TCP channel that is setup betweenmpirun and all the MPI processes).


2- How do i setup GPR?

You don't. The GPR is automatically instantiated in mpirun uponstartup.

say i have a struct foo, and all processes have at least one suchinstance of foo. From what I gather, openmpi will create a linkedlist of foo's that were passed on (though I am unable to pass oneon). Where do i have to define struct foo so that it can beexchanged b/w the processes? I know its a lame question, but I thinki am getting lost in the sea. :(


I assume you're asking about the modex.

Every BTL defines its own data that is passed around in the modex. Itis assumed that only modules of the same BTL type will be able to read/understand that data. The modex just treats the data as a blob; allthe modex blobs are gathered into mpirun and then broadcast out toevery MPI process (I said scatter in my previous mail; broadcast ismore accurate).

So when you modex_send, you just pass a pointer to a chunk of memoryand a length (e.g., a pointer to a struct instance and a length).When you modex_receive, you can just dereference the blob that youreturn as the same struct type as you modex_send'ed previously(because you can only receive blobs from BTL modules that are the sametype as you, and therefore the data they sent is the same type of datathat you sent).

You can do more complex things in the modex if you need to, ofcourse. For example, we're changing the openib BTL to send variablelength data in the modex, but that requires a bit more setup and Isuspect you don't need to do this.

Best Regards,
Muhammad Atif
PS: I am totally new to MPI internals. So if at all we decide to goahead with the project, I would be regular bugger in the list.

That's what we're here for. We don't always reply immediately, but wetry. :-)


----- Original Message ----
From: Adrian Knoth <[email protected]>
To: Open MPI Developers <[email protected]>
Sent: Thursday, January 10, 2008 1:24:01 AM
Subject: Re: [OMPI devel] btl tcp port to xensocket

On Tue, Jan 08, 2008 at 10:51:45PM -0800, Muhammad Atif wrote:

> I am planning to port tcp component to xensocket, which is a fast
> interdomain communication mechanism for guest domains in Xen. I may

Just to get things right: You first partition your SMP/Multicoresystem

with Xen, and then want to re-combine it later for MPI communication?

Wouldn't it be easier to leave the unpartitioned host alone and use
shared memory communication instead?

> As per design, and the fact that these sockets are not normalsockets,> I have to pass certain information (basically memory references,guest

> domain info etc) to other peers once sockets have been created. I

There's ORTE, the runtime environment. It employs OOB/tcp to have a so
called out-of-band channel. ORTE also provides a general purpose
registry (GPR).

Once a TCP connection between the headnode process and all other peers
is established, you can store your required information in the GPR.

> understand that mca_pml_base_modex_send and recv (or simply using
> mca_btl_tcp_component_exchange) can be used to exchange information,

Use mca_pml_base_modex_send (now ompi_modex_send) and encode your

required information. It's getting stored in the GPR. Read it backwith

mca_pml_base_modex_recv (ompi_modex_recv), as it is done in
mca_btl_tcp_component_exchange and mca_btl_tcp_proc_create.

> but I cannot seem to get them to communicate. So to put myquestion in> a very simple way..... I want to create a socket structurecontaining

> necessary information, and then pass it to all other peers before
> start of actual mpi communication. What is the easiest way to do it.


Quite the same way. mca_btl_tcp_component_exchange assembles the
required information and stores it in the GPR by calling
ompi_modex_send.

mca_btl_tcp_proc_create (think of "the other peers") reads this
information into local context.

I guess you might want to copy btl/tcp to let's say btl/xen, so youcan

modify internal structures, if required. Perhaps xensockets don't need
IP addresses, as they are actually memory sockets.

However, you'll still need TCP communication between Xen guests forthe

OOB channel.


As mentioned above, I'm not sure if it's reasonable to use Xen and MPI
at all. Virtualization overhead might decrease your performance, and
that's usually the last thing you want to have when using MPI ;)


HTH

--
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Be a better friend, newshound, and know-it-all with Yahoo! Mobile.Try it now._______________________________________________

devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] btl tcp port to xensocket

Reply via email to