Sebastien, For the most part we try to match the bandwidth of the disks, to the network, to the number of routers needed. I will be at the Lustre User Group meeting in Sonoma, CA at the end of this month giving a talk about Lustre at LLNL, including our network design, and router usage, but here is a quick description.
We have a large federated ethernet core. We then have edge switches for each of our clusters that have links up to the core, and back down to the routers or tcp-only clients. In a typical situation, if we think one file system can achieve 20 GB/s based on disk bandwidth, we try to make sure that the filesystem cluster has 20 GB/s network bandwith (1GigE, 10GigE, etc), and that the routers for the compute cluster total up to 20 GB/s as well. So we may have a server cluster with servers having dual GigE links, and routers with 10 GigE links, and we just try to match them up so the numbers are even. Typically, the routers in a cluster are the same node type as the compute cluster, just populated with additional network hardware. In the future, we will likely be building a router cluster that will bridge our existing federated ethernet core to a large Infinband network, but that is at least one year away. Most of our routers are rather simple, the have one high speed interconnect HCA (Quadrics, Mellanox IB), and one network card ( dual GigE, or single 10 GigE). I don't think we've hit any bus bandwidth limitation, and I haven't seen any of them really pressed for CPU or Memory. We do make sure to turn of irq_affinity when we have a single network interface (the 10 GigE routers), and we've had to tune the buffers and credits on the routers to get better throughput. We have noticed a problem with serialization of checksum processing on a single core (bz #14690). The beauty of routers though, is that if you find that they are all running at capacity, you can always add a couple more, and move the bottleneck to the network or disks. We find we are mostly slowed down by the disks. -Marc ---- D. Marc Stearman LC Lustre Administration Lead [EMAIL PROTECTED] 925.423.9670 Pager: 1.888.203.0641 On Apr 10, 2008, at 1:06 AM, Sébastien Buisson wrote: > Let's consider that the internal bus of the machine is bigger > enough so > that it will not be saturated. In that case, what will be the limiting > factor? memory? CPU? > I know that it depends on how many I/B cards are plugged in the > machine, > but generally speaking, is the routing activity CPU or memory hungry? > > By the way, are there people on that list that have feedback about > Lustre routers sizing? For instance, I know that Lustre routers have > been set up at the LLNL. What is the throughput obtained via the > routers, compared to the raw bandwidth of the interconnect? > > Thanks, > Sebastien. > > > Brian J. Murrell a écrit : >> On Wed, 2008-04-09 at 19:07 +0200, Sébastien Buisson wrote: >>> I mean, if I >>> have an available bandwith of 100 on each side of a router, what >>> will be >>> the max reachable bandwith from clients on one side of the router to >>> servers on the other side of the router? Is it 50? 80? 99? Is the >>> routing process CPU or memory hungry? >> >> While I can't answer these things specifically another important >> consideration is the bus architecture involved. How many I/B >> cards can >> you put on a bus before you saturate the bus? >> >> b. >> >> >> >> --------------------------------------------------------------------- >> --- >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
