Re: [fonc] Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

Alan Kay Sat, 14 Apr 2012 04:22:55 -0700

This is a good idea (and, interestingly, was a common programming style in the 
first Simula) ...


Cheers,

Alan




>________________________________
> From: David Barbour <dmbarb...@gmail.com>
>To: Fundamentals of New Computing <fonc@vpri.org> 
>Sent: Friday, April 13, 2012 11:03 PM
>Subject: Re: [fonc] Ask For Forgiveness Programming - Or How We'll Program 
>1000 Cores
> 
>
>Another option would be to introduce slack in the propagation delay itself. 
>E.g. if I send you a message indicating the meeting has been moved to 1:30pm, 
>it would be a good idea to send it a good bit in advance - perhaps at 10:30am 
>- so you can schedule and prepare. 
>
>
>With computers, the popular solution - sending a message *when* we want 
>something to happen - seems analogous to sending the message at 1:29:58pm, 
>leaving the computer always on the edge with little time to prepare even for 
>highly predictable events, and making the system much more vulnerable to 
>variations in communication latency.
>
>
>On Fri, Apr 13, 2012 at 8:34 AM, David Goehrig <d...@nexttolast.com> wrote:
>
>There's a very simple concept that most of the world embraces in everything 
>from supply chain management, to personnel allocations, to personal 
>relationships. 
>>
>>
>>We call it *slack*
>>
>>
>>What Dan is talking about amounts to introducing slack into distributed 
>>models.  Particularly this version of the definition of slack:
>>
>>
>>": lacking in completeness, finish, or perfection <a very slack piece of 
>>work>"
>>
>>
>>Which is a more realistic version of computation in a universe with 
>>propagation delay (finite speed of light). But it also introduces a concept 
>>similar to anyone familiar with ropes. You can't tie a knot without some 
>>slack. (computation being an exercise in binary sequence knot making). 
>>Finishing a computation is analogous to pulling the rope taunt. 
>>
>>
>>Dave
>>
>>
>>
>>
>>
>>
>>
>>
>>-=-=- d...@nexttolast.com -=-=-
>>
>>On Apr 13, 2012, at 5:53 AM, Eugen Leitl <eu...@leitl.org> wrote:
>>
>>
>>
>>>http://highscalability.com/blog/2012/3/6/ask-for-forgiveness-programming-or-how-well-program-1000-cor.html
>>>
>>>Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
>>>
>>>Tuesday, March 6, 2012 at 9:15AM
>>>
>>>The argument for a massively multicore future is now familiar: while clock
>>>speeds have leveled off, device density is increasing, so the future is cheap
>>>chips with hundreds and thousands of cores. That’s the inexorable logic
>>>behind our multicore future.
>>>
>>>The unsolved question that lurks deep in the dark part of a programmer’s mind
>>>is: how on earth are we to program these things? For problems that aren’t
>>>embarrassingly parallel, we really have no idea. IBM Research’s David Ungar
>>>has an idea. And it’s radical in the extreme...
>>>
>>>Grace Hopper once advised “It's easier to ask for forgiveness than it is to
>>>get permission.” I wonder if she had any idea that her strategy for dealing
>>>with human bureaucracy would the same strategy David Ungar thinks will help
>>>us tame  the technological bureaucracy of 1000+ core systems?
>>>
>>>You may recognize David as the co-creator of the Self programming language,
>>>inspiration for the HotSpot technology in the JVM and the prototype model
>>>used by Javascript. He’s also the innovator behind using cartoon animation
>>>techniques to build user interfaces. Now he’s applying that same creative
>>>zeal to solving the multicore problem.
>>>
>>>During a talk on his research, Everything You Know (about Parallel
>>>Programming) Is Wrong! A Wild Screed about the Future, he called his approach
>>>“anti-lock or “race and repair” because the core idea is that the only way
>>>we’re going to be able to program the new multicore chips of the future is to
>>>sidestep Amdhal’s Law and program without serialization, without locks,
>>>embracing non-determinism. Without locks calculations will obviously be
>>>wrong, but correct answers can be approached over time using techniques like
>>>fresheners:
>>>
>>>   A thread that, instead of responding to user requests, repeatedly selects
>>>a cached value according to some strategy, and recomputes that value from its
>>>inputs, in case the value had been inconsistent. Experimentation with a
>>>prototype showed that on a 16-core system with a 50/50 split between workers
>>>and fresheners, fewer than 2% of the queries would return an answer that had
>>>been stale for at least eight mean query times. These results suggest that
>>>tolerance of inconsistency can be an effective strategy in circumventing
>>>Amdahl’s law.
>>>
>>>During his talk David mentioned that he’s trying  to find a better name than
>>>“anti-lock or “race and repair” for this line of thinking. Throwing my hat
>>>into the name game, I want to call it Ask For Forgiveness Programming (AFFP),
>>>based on the idea that using locks is “asking for permission” programming, so
>>>not using locks along with fresheners is really “asking for forgiveness.” I
>>>think it works, but it’s just a thought.
>>>
>>>No Shared Lock Goes Unpunished
>>>
>>>Amdahl’s Law is used to understand why simply having more cores won’t save us
>>>for a large class of problems. The idea is that any program is made up of a
>>>serial fraction and a parallel fraction. More cores only helps you with the
>>>parallel portion. If an operation takes 10 seconds, for example, and one
>>>second of it is serial, then having infinitely many cores will only help you
>>>make the parallelizable part faster, the serial code will always take  one
>>>second. Amdahl says you can never go faster than that 10%. As long as your
>>>code has a serial portion it’s impossible to go faster.
>>>
>>>Jakob Engblom recounts a similar line of thought in his blog:
>>>
>>>   They also had the opportunity to test their solution [for parallel
>>>Erlang] on a Tilera 64-core machines. This mercilessly exposed any
>>>scalability limitations in their system, and proved the conventional wisdom
>>>that going beyond 10+ cores is quite different from scaling from 1 to 8… The
>>>two key lessons they learned was that no shared lock goes unpunished, and
>>>data has to be distributed as well as code.
>>>
>>>   It seems that for all big system parallelization efforts turn into a hunt
>>>for locks and the splitting up of code and data into units that can run in
>>>parallel without having to synchronize. The “upper limit” of this process is
>>>clearly a system with no synchronization points at all.
>>>
>>>Sherlock Holmes says that when you have eliminated the impossible, whatever
>>>remains, however improbable, must be the truth, so the truth is: removing
>>>serialization is the only way to use all these cores. Since synchronization
>>>is the core serial component to applications, we must get rid of
>>>synchronization.
>>>
>>>A Transaction View Isn’t as Strong as it Once Was
>>>
>>>Getting rid of locks/synchronization may not be the radical notion it once
>>>was. Developments over the last few years have conditioned us to deal with
>>>more ambiguity, at least for distributed systems.
>>>
>>>Through many discussions about CAP, we’ve come to accept a non-transactional
>>>view of databases as a way to preserve availability in the face of
>>>partitions. Even if a read doesn’t return the last write, we know it will
>>>eventually and that some merge logic will make them consistent once again.
>>>
>>>The idea of compensating transactions as a way around the performance
>>>problems due to distributed coordination has also become more familiar.
>>>
>>>Another related idea comes from the realm of optimistic concurrency control
>>>that lets multiple transactions proceed in parallel without locking and then
>>>checks at commit time if a conflict requires a rollback. The application then
>>>gets to try again.
>>>
>>>And for some time now memcache has supported a compare-and-set operation that
>>>allows multiple clients to avoid writing over each other by comparing time
>>>stamps.
>>>
>>>As relaxed as these methods are, they all still require that old enemy:
>>>synchronization.
>>>
>>>Your Answers are Already Wrong
>>>
>>>The core difficulty with abandoning synchronization is coming to terms with
>>>the notion that results may not be correct at all times. It’s a certainty we
>>>love certainty, so even considering we could get wrong answers for any window
>>>of time is heresy.
>>>
>>>David says we need to change our way of thinking. The idea is to get results
>>>that are “good enough, soon enough.” Get wrong answers quickly, but that are
>>>still right enough to be useful. Perfect is the enemy of the good and perfect
>>>answers simply take too long at scale.
>>>
>>>David emphasises repeatedly that there’s a fundamental trade-off between
>>>correctness and performance. Without locks operations happen out of order,
>>>which gives the wrong answer. Race conditions happen. To get correct answers
>>>we effectively add in delays, making one thing wait for another, which kills
>>>performance, and we don’t want to kill performance, we want to use all those
>>>cores.
>>>
>>>But consider an aspect of working on distributed systems that people don’t
>>>like to think about: your answers are already always wrong.
>>>
>>>Unless you are querying a read-only corpus or use a global lock, in
>>>distributed systems any answer to any query is potentially wrong, always.
>>>This is the hardest idea to get into your brain when working in distributed
>>>systems with many cores that experience simultaneous updates. The world never
>>>stops. It’s always in flux. You can never assume for a single moment there’s
>>>a stable frame of reference. The answer from a query you just made could be
>>>wrong in the next instant. A query to see if a host is up, for example, can’t
>>>ever be assumed right. That host maybe up or down in the next instant and
>>>your code won’t know about it until it finds a conflict.
>>>
>>>So are we really that far away from accepting that all answers to queries
>>>could be wrong?
>>>
>>>Lessons from Nature
>>>
>>>One strategy for dealing with many cores is to move towards biological models
>>>instead of mathematical models, where complicated behaviours emerge without
>>>global determinism. Bird flocks, for example, emerge from three simple rules:
>>>avoid crowding, steer towards average heading of neighbors,  steer towards
>>>average position of neighbors. No pi-calculus required, it works without
>>>synchronization or coordination. Each bird is essentially its own thread, it
>>>simply looks around and makes local decisions. This is a more cellular
>>>automaton view of the world.
>>>
>>>Race and Repair - Mitigating Wrong Results
>>>
>>>The idea is that errors created by data races won’t be prevented, they will
>>>be repaired. A calculation made without locks will be wrong under concurrent
>>>updates. So why not use some of our many cores to calculate the right answers
>>>in the background, in parallel, and update the values? This approach:
>>>
>>>   Uses no synchronization.
>>>
>>>   Tolerates some wrong answers.
>>>
>>>   Probabilistically fixes the answers over time.
>>>
>>>Some obvious questions: how many background threads do you need? What order
>>>should values be recalculated? And how wrong will your answers be?
>>>
>>>To figure this out an experiment was run and described in Inconsistency
>>>Robustness for Scalability in Interactive Concurrent‑Update In-Memory MOLAP
>>>Cubes, which test updates on a complicated spreadsheet.
>>>
>>>With locking the results were correct, but scalability was limited. Without
>>>locking, results were usually wrong. Both results are as might be expected.
>>>And when they added freshener threads they found:
>>>
>>>   Combining the frequency with the duration data suggests that in a 16-core
>>>system with a 50/50 split between workers and fresheners, fewer than 2% of
>>>the queries would return an answer that had been stale for at least eight
>>>mean query times.
>>>
>>>I found this result quite surprising. I would have expected the answers to be
>>>wrong more of the time. Could this actually work?
>>>
>>>Breadcrumbs
>>>
>>>The simple idea of Race and Repair is open to a lot of clever innovations.
>>>Breadcrumbs are one such innovation that attempts to be smarter about which
>>>values need recalculating. Meta-data is attached on a value indicating that a
>>>entity is recalculating the value or has changed a dependent value such that
>>>this value is now out of date. Any entity that might want to use this data
>>>can wait until a “valid” value is calculated and/or not to insert a
>>>calculated value if it is out of data. This narrows the window of time in
>>>which errors are introduced. It’s a mitigation.
>>>
>>>There are endless variations of this. I can imagine remembering calculations
>>>that used a value and then publishing updated values so those calculations
>>>can be rerun. The result is a roiling  event driven sea that is constantly
>>>bubbling with updates that are trying to bring values towards correctness,
>>>but probably never quite getting there.
>>>
>>>Probabilistic Data Structures
>>>
>>>Another area David has researched are hashtables that can be inserted into
>>>without synchronization. This would allow entries to be added to a hashtable
>>>from any number of cores without slowing down the entire system with a lock.
>>>Naive insertion into a hashtable in parallel will mess up pointers, which can
>>>either result in the loss of values or the insertion of values, but it’s
>>>possible to work around these issues and create a lock free hashtable.
>>>
>>>His presentation goes into a lot of detail on how this might work. He rejects
>>>light-weight locking schmes like CAS because these are still a choke point
>>>and there’s a penalty for atomic instructions under contention. They won’t
>>>scale.
>>>
>>>He thinks there’s a big research opportunity in probabilistic data structures
>>>that work without synchronization and that work with mitigation.
>>>
>>>Is this the Future?
>>>
>>>This is just a light weight introduction. For more details please read all
>>>the papers and watch all the videos. But I think it’s important to talk about
>>>and think about how we might make use of all these cores for traditional
>>>programming tasks, though the result may be anything but traditional.
>>>
>>>A bad experience on one project makes me somewhat skeptical that human nature
>>>will ever be comfortable in accepting wrong answers as the norm. My
>>>experience report was on an event driven system with a large number of nodes
>>>that could generate events so fast that events had to be dropped. Imagine a
>>>city undergoing an earthquake and a huge sensor net spewing out change events
>>>to tell the world what’s going on in real-time.
>>>
>>>The original requirement was events could never be dropped, ever, which made
>>>a certain amount of sense when the number of sensors was small. As the number
>>>of sensors expands it’s simply not possible. My proposal was to drop events
>>>and have background processes query the actual sensors in the background so
>>>that the database would be synced over time. Very much like the proposals
>>>here.
>>>
>>>It was a no go. A vehement no go. Titanic arguments ensued.  Management and
>>>everyone on down simply could not accept the idea that their models would be
>>>out of sync with the sensors (which of course they were anyway). My eventual
>>>solution took a year to implement and radically changed everything, but that
>>>was simpler than trying to convince people to deal with uncertainty.
>>>
>>>So there might be some convincing to do.
>>>
>>>Related Articles
>>>
>>>   Everything You Know (About Parallel Programming) Is Wrong!: A Wild Screed
>>>About the Future (slides, video)
>>>
>>>   Renaissance Project
>>>
>>>   On Hacker News
>>>
>>>   David Ungar: It is Good to be Wrong
>>>
>>>   We Really Don't Know How To Compute!
>>>
>>>   Harnessing emergence for manycore programming: early experience
>>>integrating ensembles, adverbs, and object-based inheritance
>>>
>>>   Inconsistency Robustness for Scalability in Interactive Concurrent‑Update
>>>In-Memory MOLAP Cubes
>>>
>>>
>>>_______________________________________________
>>>fonc mailing list
>>>fonc@vpri.org
>>>http://vpri.org/mailman/listinfo/fonc
>>>
>>_______________________________________________
>>fonc mailing list
>>fonc@vpri.org
>>http://vpri.org/mailman/listinfo/fonc
>>
>>
>
>
>
>-- 
>bringing s-words to a pen fight
>
>_______________________________________________
>fonc mailing list
>fonc@vpri.org
>http://vpri.org/mailman/listinfo/fonc
>
>
>

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Re: [fonc] Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

Reply via email to