On Fri, Sep 24, 2010 at 12:21 PM, Matt Hurd <[email protected]> wrote: > I'm associated with a somewhat stealthy start-up. Only teaser product > with some details out so far is a type of packet replicator.
>From your description as well as from a quick look at the website, it looks and smells like a hub - I mean a dumb hub, like those which existed in the '90s before switching hubs (now called switches) took over. If so, then HPC might not be a good target for you, as it has long ago adopted switches for good reasons. > Primarily focused on low-latency > distribution of market data to multiple users as the port to port HPC usage is a mixture of point-to-point and collective communications; most (all?) MPI library use low level point-to-point communications to achieve collective ones over Ethernet.. Another important point is that the collective communications can be started by any of the nodes - it's not one particular node which generates data and then spreads it to the others; it's also relatively common that 2 or more nodes reach the point of collective communication at the same time, leading to a higher load on the interconnect, maybe congestion. What might be worth a try is a mixed network config where point-to-point communications go through one NIC connected to a switch and the collective communications that can use a broadcast go through another NIC connected to your packet replicator. However, IMHO it would only make sense if the packet replicator makes some guarantees about delivery: f.e. that it would accept a packet from node B even if a packet from node A is being broadcasted at that time; this packet from node B would be broadcasted immediately after the previous transmission has finished. This of course means that each link NIC-packet replicator needs to be duplex and some buffering should be present - this was not the case of the dumb hubs mentioned earlier. I think that such a setup would be enough for MPI_Barrier and MPI_Bcast. One other HPC related application that comes to my mind is distributed storage. One of the main problems is keeping redundant metadata to prevent the whole storage going down if one of the metadata servers goes down. With such a packet replicator, the active metadata server can broadcast it to the others; this would be just one operation - with a switched architecture, this would require N-1 operations (N being the total nr. of metadata servers) and would loose any pretence of atomicity and speed. > They suggested interest in bigger port counts and mentioned >1000 ports. Hmmm, if it's only like a dumb hub (no duplex, no buffering), then I have a hard time imagining how it would work at these port counts - the number of collisions would be huge... Cheers, Bogdan _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
