On 01/09/2023 19:39, Paul Koning via cctalk wrote:
There's actually a pull in two opposite directions. One is to put more stuff
within a chip (System On Chip approach) and make the interconnects inside very
wide, perhaps an entire L1 cache line wide. The Raza/NetLogic/Broadcom XLR and
its successors are a good example, very nice MIPS-64 SOCs. The other is to do
off-chip interconnects serially at very high clock rates.
Indeed. Internal interconnects are quite easy to keep a consistent
propagation delay. In a similar fashion, (i believe, correct me if i'm
wrong) most memory buses are also parallel. It's easy to keep the
propagation delay consistent when it's essentially baked into the
product. Things like expansion buses and "external" buses like USB can't
guarantee that the propagation delay is consistent, and as such, can't
operate at a significant speed.
Of course there are cases where serial isn't fast enough. The fastest
Ethernets are an example, with their multi-lane transceiver buses. Another is
the JESD204 standard, used in signal processing to connect A/D and D/A
converters, where you might be looking at multiple analog data streams, 14-16
bits wide, multiple Gsamples/second. That might takes 2-8 serial links working
together. For those, there isn't a requirement for alignment of the bits
across the wires, instead the data streams are reconstructed serially for each
lane and then aligned properly to form the words. So within reason the lanes
may have different propagation delay and still work.
paul
This is essentially how PCIe works. It's easier to take multiple high
speed serial streams, and reconstruct the data afterwards, than it is to
operate those lanes in a synchronous way. Logic is simpler, and
bandwidths are higher.
Generally, the serial vs parallel problem has been solved by using
multiple serial streams in parallel. True parallel buses are really only
useful when latency is a bigger issue than throughput, as whilst serial
buses can be fast, it still takes a significant amount of time to
reconstruct the data into words, even if you can do a lot very quickly.
For most uses, that kind of low latency performance just isn't needed.
Josh