> On Aug 15, 2015, at 10:33 AM, Logan Streondj <[email protected]> wrote:
> In pure topological parallel algorithms, the resource you are trying to 
> maximally exploit is bandwidth. You never want an idle wire that can carry 
> bits between compute elements. Designing code that ensures that all the wires 
> in your system are efficiently saturated all the time is a pretty foreign 
> concept to most programmers. Doing this well is an extremely rare skill, even 
> among people that write software for supercomputers (and hence why a lot of 
> supercomputing codes scale relatively poorly in practice).
> 
> 
> The best I've come up with for it is to minimize loaded program size.
> Map/reduce/expand is good at doing that by breaking it arrays up into little 
> bits. 
> 
> What design patterns do you have for these bandwidth limits? 


It generally involves three design characteristics; (1) using data structures 
that never require reorganization of your data i.e. no hash tables or range 
partitioning, (2) logically generating uniform “work units” of compute that 
exceed the number compute elements by at least a couple orders of magnitude, 
and (3) have every compute element independently and dynamically schedule its 
own work sequence using game theory to orchestrate instead of explicit 
coordination. #2 is a somewhat trivial but important design element, #1 and #3 
are rarely seen in parallel software systems.

Unlike map/reduce-style models, which tend to produce pathological amounts of 
data motion between steps, this allows you to consistently achieve close to the 
theoretical minimum data motion for the computation at hand *and* ensure that 
the data motion that does occur is approximately uniform over the wires in the 
system. 



> I'm pretty sure ASIC's are faster for all static algorithms.
> The reasons FPGA's are nice  is for "dynamic" algorithms, or ones which 
> weren't predicted to be needed at time of computer construction.   Any 
> algorithm which becomes thoroughly entrenched enough, or is often used in 
> FPGA it would make sense to migrate onto ASIC's.   Though in terms of 
> evolving algorithms, and adapting to new conditions as an AGI would have to, 
> FPGA's are much better than pure software algorithms. 


By “commodity fixed silicon”, I mean CPUs and similar. From a code standpoint, 
these are as dynamic as FPGAs. FPGAs are only efficient for implementing 
transforms on data streams. Think encryption, compression, codecs, etc. As 
common complex bitwise transform building blocks are added to CPU instruction 
sets, turning long idiomatic sequences of and/or/shift/mask/etc into a 
single-cycle instruction, it is becoming increasingly possible to compose 
transform implementations that are faster than an FPGA implementation due to 
the much higher clock rates of CPUs.  

Even the reprogrammability of FPGAs has no advantages. Modern software 
increasingly uses dynamic runtime code generation and compilation to implement 
core functionality. I’ve designed commercial software that is continuously 
compiling parts of itself thousands of times per second in response to user 
interaction.


> IMHO, barrel processors are the most efficient modern computing architecture 
> in the abstract. The problem is that there seem to be a dozen people in the 
> world that know how to properly design codes for them. This is ironic because 
> they are much easier to design for than GPUs or similar but no one wants to 
> market them as not being CPU-like and they fake it well at first blush.
> 
> So please enlighten us, as a member of the dozen.
> 
> If not map/reduce/expand then what design patterns do you use? 


That is far beyond the scope of this mailing list and a very long discussion. A 
number of simple design patterns that are (correctly) assumed to be expensive 
and non-scalable on CPUs, and therefore to be avoided, are approximately free 
on barrel processors. People tend to design code as though it is a CPU where it 
is just much easier to design multithreaded code that works well. However, the 
standard set of data structures and algorithms every programmer uses assumes a 
CPU architecture and are substantially suboptimal on a barrel processor. If you 
want to use them well, you need different data structures and algorithms for 
your basic computing tasks. 

Anyone can read the silicon reference documentation. Most programmers have 
little ability to design approximately optimal data structures and algorithms 
for an arbitrary silicon architecture from first principles. In the specific 
case of barrel processors, there is a dearth of literature you can look at 
where people have already figured it out.

This has been the problem with supercomputers generally. No one designs 
algorithms that match the hardware. 




-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Reply via email to