> On Aug 15, 2015, at 10:33 AM, Logan Streondj <[email protected]> wrote: > In pure topological parallel algorithms, the resource you are trying to > maximally exploit is bandwidth. You never want an idle wire that can carry > bits between compute elements. Designing code that ensures that all the wires > in your system are efficiently saturated all the time is a pretty foreign > concept to most programmers. Doing this well is an extremely rare skill, even > among people that write software for supercomputers (and hence why a lot of > supercomputing codes scale relatively poorly in practice). > > > The best I've come up with for it is to minimize loaded program size. > Map/reduce/expand is good at doing that by breaking it arrays up into little > bits. > > What design patterns do you have for these bandwidth limits?
It generally involves three design characteristics; (1) using data structures that never require reorganization of your data i.e. no hash tables or range partitioning, (2) logically generating uniform “work units” of compute that exceed the number compute elements by at least a couple orders of magnitude, and (3) have every compute element independently and dynamically schedule its own work sequence using game theory to orchestrate instead of explicit coordination. #2 is a somewhat trivial but important design element, #1 and #3 are rarely seen in parallel software systems. Unlike map/reduce-style models, which tend to produce pathological amounts of data motion between steps, this allows you to consistently achieve close to the theoretical minimum data motion for the computation at hand *and* ensure that the data motion that does occur is approximately uniform over the wires in the system. > I'm pretty sure ASIC's are faster for all static algorithms. > The reasons FPGA's are nice is for "dynamic" algorithms, or ones which > weren't predicted to be needed at time of computer construction. Any > algorithm which becomes thoroughly entrenched enough, or is often used in > FPGA it would make sense to migrate onto ASIC's. Though in terms of > evolving algorithms, and adapting to new conditions as an AGI would have to, > FPGA's are much better than pure software algorithms. By “commodity fixed silicon”, I mean CPUs and similar. From a code standpoint, these are as dynamic as FPGAs. FPGAs are only efficient for implementing transforms on data streams. Think encryption, compression, codecs, etc. As common complex bitwise transform building blocks are added to CPU instruction sets, turning long idiomatic sequences of and/or/shift/mask/etc into a single-cycle instruction, it is becoming increasingly possible to compose transform implementations that are faster than an FPGA implementation due to the much higher clock rates of CPUs. Even the reprogrammability of FPGAs has no advantages. Modern software increasingly uses dynamic runtime code generation and compilation to implement core functionality. I’ve designed commercial software that is continuously compiling parts of itself thousands of times per second in response to user interaction. > IMHO, barrel processors are the most efficient modern computing architecture > in the abstract. The problem is that there seem to be a dozen people in the > world that know how to properly design codes for them. This is ironic because > they are much easier to design for than GPUs or similar but no one wants to > market them as not being CPU-like and they fake it well at first blush. > > So please enlighten us, as a member of the dozen. > > If not map/reduce/expand then what design patterns do you use? That is far beyond the scope of this mailing list and a very long discussion. A number of simple design patterns that are (correctly) assumed to be expensive and non-scalable on CPUs, and therefore to be avoided, are approximately free on barrel processors. People tend to design code as though it is a CPU where it is just much easier to design multithreaded code that works well. However, the standard set of data structures and algorithms every programmer uses assumes a CPU architecture and are substantially suboptimal on a barrel processor. If you want to use them well, you need different data structures and algorithms for your basic computing tasks. Anyone can read the silicon reference documentation. Most programmers have little ability to design approximately optimal data structures and algorithms for an arbitrary silicon architecture from first principles. In the specific case of barrel processors, there is a dearth of literature you can look at where people have already figured it out. This has been the problem with supercomputers generally. No one designs algorithms that match the hardware. ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
