> Yes. Although I work for a company that prides itself on its cache > coherence know-how, I'm very much a believer in networked > multiprocessors, even on a chip. I like Cell better than Opteron, > for example. They are harder to program up front, however, which > causes difficulties in adoption. Flip-side, once you've overcome > your startup hurdles the networked model seems to provide more > predictable performance management.
tell me about it. a certain (nameless) vendor makes a pcie ethernet chipset with its descriptor rings in system memory, not pci space. it's bizarre watching the performance vs. the number of buffers loaded into the ring between head ptr updates. slight tweeks to the algorithm can result in 35% performance differences. suprisingly, another (also nameless) vendor makes a similar chipset with rings in pci space. this chipset has very stable performance in the face of tuning of the reloading loop. this chip performs just as well as the former though each 32-bit write to the ring buffer results in a round trip over the pcie bus to the card. - erik