Hi Al

Al Chu wrote:
Hey Jeff,

On Wed, 2008-06-11 at 09:43 -0700, Jeff Becker wrote:
Basically, we have an Altix ICE cluster connected by a pair of hypercube Infiniband fabrics. External to that, we have some Lustre nodes connected into the cluster with Infiniband. Our goal is to keep Lustre traffic separate from compute (MPI) traffic. Ideally, we'd have 2 subnets and an IB router between the Lustre fabric and the compute fabric to accomplish this.

I see.  In your environment, the lustre storage servers are on the same
fabric as your compute nodes?
Right.
Barring that, I thought we could use partitions as follows: compute HCA's and switch ports are on both partitions with full membership in compute partition, and limited membership in I/O partition. The Lustre nodes and switches would only be in the I/O partition (full membership). That way, inter compute node (MPI) traffic would be disallowed from using routes through the I/O fabric (by partition membership), and I/O traffic could not interfere with compute (via separate partitions). Is this scheme feasible?

If that's not possible, the next idea is to modify OpenSM to assign large weights to the links between the compute and I/O fabrics, so that the MinHop algorithm would never consider using these links for inter-compute node traffic.

So dedicating (for example) X out of Y uplinks for MPI only and the
remaining uplinks for lustre only?
That works. The compute nodes need to talk to other compute nodes for MPI over one set of links, and they need to talk to the Lustre nodes for I/O, but over a different (disjoint) set of links. Thanks.

-jeff
Al

Thoughts? Thanks.

-jeff

Al Chu wrote:
Hey Jeff,

Out of my curiosity, are you just trying to change the routing to
improve job performance?  i.e. lustre nodes get special routing vs.
compute nodes?

Al

On Tue, 2008-06-10 at 15:08 -0700, Jeff Becker wrote:
Hi all. I was looking into doing some subnet partitioning to separate compute nodes from Lustre nodes, and I saw the following in ~sashak/management.git on the OFA server, in opensm/doc/OpenSM_PKey_Mgr.txt

OpenSM Partition Management
---------------------------

Roadmap:
Phase 1 - provide partition management at the EndPort (HCA, Router and Switch
          Port 0) level with no routing affects.
Phase 2 - routing engine should take partitions into account.
...
Phase 2 functionality:

The partition policy should be considered during the routing such that
links are associated with particular partition or a set of
partitions. Policy should be enhanced to provide hints for how to do
that (correlating to QoS too). The exact algorithm is TBD.


What is the status of Pkey-aware routing? Thanks.

-jeff

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to