> Yes.  Although I work for a company that prides itself on its cache  
> coherence know-how, I'm very much a believer in networked  
> multiprocessors, even on a chip.   I like Cell better than Opteron,  
> for example.  They are harder to program up front, however, which  
> causes difficulties in adoption.  Flip-side, once you've overcome  
> your startup hurdles the networked model seems to provide more  
> predictable performance management.

tell me about it.  a certain (nameless) vendor makes a pcie ethernet
chipset with its descriptor rings in system memory, not pci space.
it's bizarre watching the performance vs. the number of buffers loaded
into the ring between head ptr updates.  slight tweeks to the algorithm
can result in 35% performance differences.

suprisingly, another (also nameless) vendor makes a similar chipset with
rings in pci space.  this chipset has very stable performance in the face of
tuning of the reloading loop.  this chip performs just as well as the former
though each 32-bit write to the ring buffer results in a round trip over
the pcie bus to the card.

- erik

Reply via email to