On Wednesday, 16 September 2015 at 23:28:23 UTC, H. S. Teoh wrote:
I'm not so sure how well this will work in practice, though, unless we have a working prototype that proves the benefits. What if you have a 10*10 unum matrix, and during some operation the size of the unums in the matrix changes? Assuming the worst case, you could have started out with 10*10 unums with small exponent/mantissa, maybe fitting in 2-3 cache lines, but after the operation most of the entries expand to 7-bit exponent and 31-bit mantissa, so now your matrix doesn't fit into the allocated memory anymore. So now your hardware has to talk to druntime to have it allocate new memory for storing the resulting unum matrix?
Let's not make it so complicated. The internal CPU format could just be 32 and 64 bit. The key concept is about recording closed/open intervals and precision. If you spend 16 cores of a 256 core tiled coprocessor on I/O you still have 240 cores left.
For the external format, it depends on your algorithm. If you are using map reduce you load/unload working sets, let the coprocessor do most of the work and combine the results. Like an actor based pipeline.
The problem is more that average programmers will have real trouble making good use of it, since the know-how isn't there.
The author proposed GC, but I have a hard time imagining a GC implemented in *CPU*, no less, colliding with the rest of the world where it's the *software* that controls DRAM allocation. (GC too slow for your application? Too bad, gotta upgrade your CPU...)
That's a bit into the future, isn't it? But local memory is probably less that 256K and designed for the core, so… who knows what extras you could build in? If you did it, the effect would be local, but it sounds too complicated to be worth it.
But avoid thinking that the programmer address memory directly. CPU+Compiler is one package. Your interface is the compiler, not the CPU as such.
The way I see it from reading the PDF slides, is that what the author is proposing would work well as a *software* library, perhaps backed up by hardware support for some of the lower-level primitives. I'm a bit skeptical of the claims of
First you would need to establish that there are numerical advantages that scientists require in some specific fields.
Then you need to build it into scientific software and accelerate it. For desktop CPUs, nah... most people don't care about accuracy that much.
Standards like IEEE1788 might also make adoption of unum less likely.
