On Sunday, December 13, 2015 at 9:51:31 AM UTC, Tamas Papp wrote:

> AFAIK typical SBC CPUs are not heavily optimized for floating point; 
> there is an order of magnitude difference compared to an x86. I don't 
> understand how a cluster would make economic sense, even for tasks that 
> parellelize well (and then there is the network overhead). 
>

I'm sure you are right about ARM's floating point not being as good as x86, 
especially for ARMv6, but is this changing for ARMv7 and the 64/32-bit 
ARMv8-A?
 
Even Intel in their latest Xeon Phi uses Atom cores (same, or possibly 
modified, haven't read too closely yet) as in smartphones, just more of 
them AND yes lots of cache. That of course helps with floating-point/HPC, 
and in general the better memory hierarchy. That is however both helpful 
for FPU and integer, and ARM isn't slow anymore for integer at least.

I just saw the other day that on some benchmark (I forget which, not sure 
if it uses FPU) from 3 months ago, that the A9[X] in the iPhone 6s/iPad 
slightly beats a MacBook from this year at 1.2 GHz, with non-Atom 
("mobile"), Core M CPU ("laptop", would you not call that "mobile"-kind?).


Anyway, for the argument, let's say the FPU is slow[er]. You also have GPUs 
commonly (are you ignoring those?), and I see latest Adreno in Qualcomm has 
unified "[virtual] memory" with the CPU (not sure why they put in virtual). 
GPGPU is good with at least some of the GPUs (Adreno says latest is 40% 
faster), such as Nvidia's.

Sadly Mali400-MP2 or at least Utgard the microarch it's based on, in the 
PINE64, is listed with "Graphics", not "Graphics & Compute" on the table: 
https://en.wikipedia.org/wiki/Mali_(GPU)

And it doesn't have fused-multiply-add (while some do) and "Some Malis 
support cache coherency <https://en.wikipedia.org/wiki/Cache_coherence> for 
the L2 cache with the CPU"


Commonly GPUs use single-precision (double not really in consumer, any 
maybe not at all in mobile?), but it might do - with tricks, or not..


[About clusters: Look into Unum (Universal number) that will replace 
traditional floating point, not only for speed/energy (while then needing? 
better hardware), but also more correct answers.. AND turn "embarrassingly 
serial" code into data parallel ("the easiest kind").. I may write a post 
here, in a separate thread, we shouldn't high-jack this one. Maybe add my 
thought to the Flexnum one?]


>AFAIK Julia is available in Raspbian. May not be the most recent 
version though, but 3.2 looks like it is there: 

>http://archive.raspbian.org/raspbian/pool/main/j/julia/ 
<http://www.google.com/url?q=http%3A%2F%2Farchive.raspbian.org%2Fraspbian%2Fpool%2Fmain%2Fj%2Fjulia%2F&sa=D&sntz=1&usg=AFQjCNHRIOZZWPINh6BzY6A36dSYyekNRw>
 


Doesn't this need to be fixed? 0.4.x even more important on those slower 
computers? OR is the overhead of compiling going up? Is there a fast 
compile/less optimization/"almost interpreted".. needed/already available? 
Or useful, as counter-productive at runtime..? Could there be a way to 
optimize different functions differently? Already done, say with inlined 
(leaf) functions optimized more? (And others for code space density and 
fast compilation).

P.S. Was Blackberry Pi 2 just a typo or did I miss something?

 

> Best, 
>
> Tamas 
>
> On Sun, Dec 13 2015, cdm <[email protected] <javascript:>> wrote: 
>
> > while this SBC would represent a substantial improvement 
> > over the Pi systems currently in market, i suspect that the 
> > most notable aspect is the price ... 
> > 
> > generally, as the price points come down, clusters become 
> > much more feasible ... 
> > 
> > parallelism is next. 
>

Reply via email to