On Mon, Jun 17, 2013 at 05:49:25PM -0400, Mark Hahn wrote: > On Mon, 17 Jun 2013, Eugen Leitl wrote: > <interesting dreams of nanocomputers elided - Charles Stross's novels are > entertaining extrapolations based on this kind of thing...>
Well, molecular circuitry is a lot closer now than in 1970s, now especially given that Moore's law has run into cost limits (and soon physical limits) and the only way forward is to go up -- into the third dimension, first as multilayer, and then as real autoassembled crystal from virus capsid-sized components. Actually, Moore's cost limits will bite especially for embedded and also exascale, given that it becomes a matter of not just of power dissipation, but also cost per unit, if you have millions and billions of SoC nodes. > >> everywhere. OTOH, the idea of putting processors into memory has always > >> made a lot of sense to me, though it certainly changes the programming > >> model. (even in OO, functional models, there is a "self" in the > >> program...) > > > > Memory has been creeping into the CPU for some time. Parallella e.g. > > has embedded memory in the DSP cores on-die. > > well, they have a tiny bit of memory per core - essentially software-managed, > globally-addressed cache-per-core. it shouldn't make you sit up and go It doesn't make me sit up, I've been waiting for that kind of thing for many years. It's still nice that it happens, even if the designers still think it's a gamble. I don't think it's a gamble long-term: silicon real estate is limited, and we *will* have to move the data and instructions to where the actual processing happens. By eliminating coherent cache and going for a memory mapped space that somewhen between register and cache in access latency, and can have considerable native bus and ALU width since on-die. You don't have quite this leverage with TSV stacked memories. The lack of hardware management is a plus, actually, since the OS knows at all time what's up, and what needs to be done. There needs to be awareness at the compiler level, as few assume several k or M of registers, especially really wide ones. Looks like a good fit for OpenCL here. > "Hmm!". I think it's more interesting to ponder the fact that there have > always been some small experiments with putting (highly data-parallel) > processing onto the dram chip itself. I mean, dram is fundamental: chips Yes, I'm aware, and I think it's an intermediate stage between real cellular hardware. > will be planar for a long time, therefore density demands a 2D storage array. > so a row decoder will read out a few Kb. why not perform some data-parallel > operations row-wise, on the dram chip itself: you've got the row there anyway. You can move at least some wide-bus nonfloat (but integer vector) processing into DRAM, only at very little additional costs. Integrated corrective integrity checking would be nice, BitBlt-like processing built-in is nice, parallel searches, distributed GC, some GPU-like processing, all these things are doable, assuming the processing model and standard APIs follow. > > Hybrid memory cube is > > about putting memory on top of your CPU. > > this is just a slight power optimization: drive shorter wires. > I'm looking forward to 2.5D integration, but it's evolutionary... Technically, current CPUs are already multilayer-enough so that they almost qualify for 2.5D, that's the reason it's hard to make really brilliant memories. > > is mixing memory/CPU, even though that is currently problematic in > > the current fabrication processes. > > I'm not sure how much blame can be attributed to the nature of processes > specialized to cpu vs dram. at one time this was obvious: cpus on fast but > high-leakage process being almost the perfect opposite of low-leakage dram. I understand there are still considerable, and growing process complexity differences between DRAM and CPU production. > but leakage has been a cpu issue for a long time now. there even appears > to be some interesting convergence, with 3d/finfet transistor tech being > used for dram arrays. my guess is that preferences for say, doping levels So, so they're increasing complexity in DRAM as well, due to space constraints. Interesting. I wonder how complex the APU processes are getting. > or oxide thickness do *not* form permanently conflicting fab constraints. I would really like to see an MRAM/CPU hybrid, with fully reconfigurable logic, even at runtime. > > The next step is something like > > a cellular FPGA, > > yeah, no. I don't actually think things will go in that direction, at least > not for a long time, mainstream-wise. but will we see systems that look like > big grids of dimm-like pieces? yes: processor-in-memory, not merely memory > organs supporting a distant, separate processor "brain"... We've got FPGA with attached ARM cores in SoCs already (Parallella again), but we still haven't got smart memories shipping. The mainstream at times takes decades to follow up on a promising path. > in some sense, the real question is how much of your system state is active > at any time. computers are traditionally based on the assumption that most > data is passively stored most of the time, and that we occasionally take out > some bits, mutate them, possibly store new versions. Eugen is talking about I think that model will become less important in future, simply because if you can't grow your number of switches at will as it was possible, so you have a larger fraction of it cranking in order to make faster systems. Ideally, all switches can flip the next instant, which is where you've arrived in the crystalline hardware model. However, that would be probably power-dissipation prohibitive in CMOS, so we have to wait until spintronics for that (which doesn't burn energy until you need a bit flipped, and it's a great long way to the Landauer limit yet). > more of a stream-processing model, where there is limited passive state - > ie, other than the state interlock between pipeline/cellular stages. I think > we'll continue to have lots of passive, non-dynamic state, so our > architectures will still be based on random access to big arrays. > (dram, disk, flash, whatever.) I hope disks will die. I still wonder why we're not getting cheap PCIe flash memory directly mmapped into the address space. That SATA/SAS thing is no longer helping us there. _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
