Re: Implement the "unum" representation in D ?

Ola Fosheim Grøstad via Digitalmars-d Wed, 16 Sep 2015 12:46:21 -0700

On Wednesday, 16 September 2015 at 19:21:59 UTC, deadalnix wrote:

No you don't. Because the streamer still need to load the unumone by one. Maybe 2 by 2 with a fair amount of hardwarespeculation (which means you are already trading energy forperformances, so the energy argument is weak). There is no wayyou can feed 256+ cores that way.

You can load continuously 64 bytes in a stream, decode to yourinternal format and push them into the scratchpad of other cores.You could even do this in hardware.

If you look at the ubox brute forcing method you compute manycalculations over the same data, because you solve spatially, notby timesteps. So you can run many many parallell computationsover the same data.

To gives you a similar example, x86 decoding is often thebottleneck on an x86 CPU. The number of ALUs in x86 over thepast decade decreased rather than increased, because you simplycan't decode fast enough to feed them. Yet, x86 CPUs have a 64ways speculative decoding as a first stage.

That's because we use a dumb compiler that does not prefetchintelligently. If you are writing for a tile based VLIW CPU youpreload. These calculations are highly iterative so I'd ratherthink of it as a co-processor solving a single equationrepeatedly than running the whole program. You can run the largerprogram on a regular CPU or a few cores.

The problem is not transistor it is wire. Because the damnthing is variadic in every ways, pretty much every bit as inputcan end up anywhere in the functional unit. That is a LOT ofwire.

I haven't seen a design, so I cannot comment. But keep in mindthat the CPU does not have to work with the format, it can use adifferent format internally.

We'll probably see FPGA implementations that can be run on FPGUcards for PCs within a few years. I read somewhere that a groupin Singapore was working on it.

Re: Implement the "unum" representation in D ?

Reply via email to