Re: [fonc] Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

David Goehrig Fri, 13 Apr 2012 08:34:57 -0700

There's a very simple concept that most of the world embraces in everything 
from supply chain management, to personnel allocations, to personal 
relationships.


We call it *slack*

What Dan is talking about amounts to introducing slack into distributed models. 
 Particularly this version of the definition of slack:

": lacking in completeness, finish, or perfection <a very slack piece of work>"

Which is a more realistic version of computation in a universe with propagation 
delay (finite speed of light). But it also introduces a concept similar to 
anyone familiar with ropes. You can't tie a knot without some slack. 
(computation being an exercise in binary sequence knot making). Finishing a 
computation is analogous to pulling the rope taunt. 

Dave





-=-=- [email protected] -=-=-

On Apr 13, 2012, at 5:53 AM, Eugen Leitl <[email protected]> wrote:

> 
> http://highscalability.com/blog/2012/3/6/ask-for-forgiveness-programming-or-how-well-program-1000-cor.html
> 
> Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
> 
> Tuesday, March 6, 2012 at 9:15AM
> 
> The argument for a massively multicore future is now familiar: while clock
> speeds have leveled off, device density is increasing, so the future is cheap
> chips with hundreds and thousands of cores. That’s the inexorable logic
> behind our multicore future.
> 
> The unsolved question that lurks deep in the dark part of a programmer’s mind
> is: how on earth are we to program these things? For problems that aren’t
> embarrassingly parallel, we really have no idea. IBM Research’s David Ungar
> has an idea. And it’s radical in the extreme...
> 
> Grace Hopper once advised “It's easier to ask for forgiveness than it is to
> get permission.” I wonder if she had any idea that her strategy for dealing
> with human bureaucracy would the same strategy David Ungar thinks will help
> us tame  the technological bureaucracy of 1000+ core systems?
> 
> You may recognize David as the co-creator of the Self programming language,
> inspiration for the HotSpot technology in the JVM and the prototype model
> used by Javascript. He’s also the innovator behind using cartoon animation
> techniques to build user interfaces. Now he’s applying that same creative
> zeal to solving the multicore problem.
> 
> During a talk on his research, Everything You Know (about Parallel
> Programming) Is Wrong! A Wild Screed about the Future, he called his approach
> “anti-lock or “race and repair” because the core idea is that the only way
> we’re going to be able to program the new multicore chips of the future is to
> sidestep Amdhal’s Law and program without serialization, without locks,
> embracing non-determinism. Without locks calculations will obviously be
> wrong, but correct answers can be approached over time using techniques like
> fresheners:
> 
>    A thread that, instead of responding to user requests, repeatedly selects
> a cached value according to some strategy, and recomputes that value from its
> inputs, in case the value had been inconsistent. Experimentation with a
> prototype showed that on a 16-core system with a 50/50 split between workers
> and fresheners, fewer than 2% of the queries would return an answer that had
> been stale for at least eight mean query times. These results suggest that
> tolerance of inconsistency can be an effective strategy in circumventing
> Amdahl’s law.
> 
> During his talk David mentioned that he’s trying  to find a better name than
> “anti-lock or “race and repair” for this line of thinking. Throwing my hat
> into the name game, I want to call it Ask For Forgiveness Programming (AFFP),
> based on the idea that using locks is “asking for permission” programming, so
> not using locks along with fresheners is really “asking for forgiveness.” I
> think it works, but it’s just a thought.
> 
> No Shared Lock Goes Unpunished
> 
> Amdahl’s Law is used to understand why simply having more cores won’t save us
> for a large class of problems. The idea is that any program is made up of a
> serial fraction and a parallel fraction. More cores only helps you with the
> parallel portion. If an operation takes 10 seconds, for example, and one
> second of it is serial, then having infinitely many cores will only help you
> make the parallelizable part faster, the serial code will always take  one
> second. Amdahl says you can never go faster than that 10%. As long as your
> code has a serial portion it’s impossible to go faster.
> 
> Jakob Engblom recounts a similar line of thought in his blog:
> 
>    They also had the opportunity to test their solution [for parallel
> Erlang] on a Tilera 64-core machines. This mercilessly exposed any
> scalability limitations in their system, and proved the conventional wisdom
> that going beyond 10+ cores is quite different from scaling from 1 to 8… The
> two key lessons they learned was that no shared lock goes unpunished, and
> data has to be distributed as well as code.
> 
>    It seems that for all big system parallelization efforts turn into a hunt
> for locks and the splitting up of code and data into units that can run in
> parallel without having to synchronize. The “upper limit” of this process is
> clearly a system with no synchronization points at all.
> 
> Sherlock Holmes says that when you have eliminated the impossible, whatever
> remains, however improbable, must be the truth, so the truth is: removing
> serialization is the only way to use all these cores. Since synchronization
> is the core serial component to applications, we must get rid of
> synchronization.
> 
> A Transaction View Isn’t as Strong as it Once Was
> 
> Getting rid of locks/synchronization may not be the radical notion it once
> was. Developments over the last few years have conditioned us to deal with
> more ambiguity, at least for distributed systems.
> 
> Through many discussions about CAP, we’ve come to accept a non-transactional
> view of databases as a way to preserve availability in the face of
> partitions. Even if a read doesn’t return the last write, we know it will
> eventually and that some merge logic will make them consistent once again.
> 
> The idea of compensating transactions as a way around the performance
> problems due to distributed coordination has also become more familiar.
> 
> Another related idea comes from the realm of optimistic concurrency control
> that lets multiple transactions proceed in parallel without locking and then
> checks at commit time if a conflict requires a rollback. The application then
> gets to try again.
> 
> And for some time now memcache has supported a compare-and-set operation that
> allows multiple clients to avoid writing over each other by comparing time
> stamps.
> 
> As relaxed as these methods are, they all still require that old enemy:
> synchronization.
> 
> Your Answers are Already Wrong
> 
> The core difficulty with abandoning synchronization is coming to terms with
> the notion that results may not be correct at all times. It’s a certainty we
> love certainty, so even considering we could get wrong answers for any window
> of time is heresy.
> 
> David says we need to change our way of thinking. The idea is to get results
> that are “good enough, soon enough.” Get wrong answers quickly, but that are
> still right enough to be useful. Perfect is the enemy of the good and perfect
> answers simply take too long at scale.
> 
> David emphasises repeatedly that there’s a fundamental trade-off between
> correctness and performance. Without locks operations happen out of order,
> which gives the wrong answer. Race conditions happen. To get correct answers
> we effectively add in delays, making one thing wait for another, which kills
> performance, and we don’t want to kill performance, we want to use all those
> cores.
> 
> But consider an aspect of working on distributed systems that people don’t
> like to think about: your answers are already always wrong.
> 
> Unless you are querying a read-only corpus or use a global lock, in
> distributed systems any answer to any query is potentially wrong, always.
> This is the hardest idea to get into your brain when working in distributed
> systems with many cores that experience simultaneous updates. The world never
> stops. It’s always in flux. You can never assume for a single moment there’s
> a stable frame of reference. The answer from a query you just made could be
> wrong in the next instant. A query to see if a host is up, for example, can’t
> ever be assumed right. That host maybe up or down in the next instant and
> your code won’t know about it until it finds a conflict.
> 
> So are we really that far away from accepting that all answers to queries
> could be wrong?
> 
> Lessons from Nature
> 
> One strategy for dealing with many cores is to move towards biological models
> instead of mathematical models, where complicated behaviours emerge without
> global determinism. Bird flocks, for example, emerge from three simple rules:
> avoid crowding, steer towards average heading of neighbors,  steer towards
> average position of neighbors. No pi-calculus required, it works without
> synchronization or coordination. Each bird is essentially its own thread, it
> simply looks around and makes local decisions. This is a more cellular
> automaton view of the world.
> 
> Race and Repair - Mitigating Wrong Results
> 
> The idea is that errors created by data races won’t be prevented, they will
> be repaired. A calculation made without locks will be wrong under concurrent
> updates. So why not use some of our many cores to calculate the right answers
> in the background, in parallel, and update the values? This approach:
> 
>    Uses no synchronization.
> 
>    Tolerates some wrong answers.
> 
>    Probabilistically fixes the answers over time.
> 
> Some obvious questions: how many background threads do you need? What order
> should values be recalculated? And how wrong will your answers be?
> 
> To figure this out an experiment was run and described in Inconsistency
> Robustness for Scalability in Interactive Concurrent‑Update In-Memory MOLAP
> Cubes, which test updates on a complicated spreadsheet.
> 
> With locking the results were correct, but scalability was limited. Without
> locking, results were usually wrong. Both results are as might be expected.
> And when they added freshener threads they found:
> 
>    Combining the frequency with the duration data suggests that in a 16-core
> system with a 50/50 split between workers and fresheners, fewer than 2% of
> the queries would return an answer that had been stale for at least eight
> mean query times.
> 
> I found this result quite surprising. I would have expected the answers to be
> wrong more of the time. Could this actually work?
> 
> Breadcrumbs
> 
> The simple idea of Race and Repair is open to a lot of clever innovations.
> Breadcrumbs are one such innovation that attempts to be smarter about which
> values need recalculating. Meta-data is attached on a value indicating that a
> entity is recalculating the value or has changed a dependent value such that
> this value is now out of date. Any entity that might want to use this data
> can wait until a “valid” value is calculated and/or not to insert a
> calculated value if it is out of data. This narrows the window of time in
> which errors are introduced. It’s a mitigation.
> 
> There are endless variations of this. I can imagine remembering calculations
> that used a value and then publishing updated values so those calculations
> can be rerun. The result is a roiling  event driven sea that is constantly
> bubbling with updates that are trying to bring values towards correctness,
> but probably never quite getting there.
> 
> Probabilistic Data Structures
> 
> Another area David has researched are hashtables that can be inserted into
> without synchronization. This would allow entries to be added to a hashtable
> from any number of cores without slowing down the entire system with a lock.
> Naive insertion into a hashtable in parallel will mess up pointers, which can
> either result in the loss of values or the insertion of values, but it’s
> possible to work around these issues and create a lock free hashtable.
> 
> His presentation goes into a lot of detail on how this might work. He rejects
> light-weight locking schmes like CAS because these are still a choke point
> and there’s a penalty for atomic instructions under contention. They won’t
> scale.
> 
> He thinks there’s a big research opportunity in probabilistic data structures
> that work without synchronization and that work with mitigation.
> 
> Is this the Future?
> 
> This is just a light weight introduction. For more details please read all
> the papers and watch all the videos. But I think it’s important to talk about
> and think about how we might make use of all these cores for traditional
> programming tasks, though the result may be anything but traditional.
> 
> A bad experience on one project makes me somewhat skeptical that human nature
> will ever be comfortable in accepting wrong answers as the norm. My
> experience report was on an event driven system with a large number of nodes
> that could generate events so fast that events had to be dropped. Imagine a
> city undergoing an earthquake and a huge sensor net spewing out change events
> to tell the world what’s going on in real-time.
> 
> The original requirement was events could never be dropped, ever, which made
> a certain amount of sense when the number of sensors was small. As the number
> of sensors expands it’s simply not possible. My proposal was to drop events
> and have background processes query the actual sensors in the background so
> that the database would be synced over time. Very much like the proposals
> here.
> 
> It was a no go. A vehement no go. Titanic arguments ensued.  Management and
> everyone on down simply could not accept the idea that their models would be
> out of sync with the sensors (which of course they were anyway). My eventual
> solution took a year to implement and radically changed everything, but that
> was simpler than trying to convince people to deal with uncertainty.
> 
> So there might be some convincing to do.
> 
> Related Articles
> 
>    Everything You Know (About Parallel Programming) Is Wrong!: A Wild Screed
> About the Future (slides, video)
> 
>    Renaissance Project
> 
>    On Hacker News
> 
>    David Ungar: It is Good to be Wrong
> 
>    We Really Don't Know How To Compute!
> 
>    Harnessing emergence for manycore programming: early experience
> integrating ensembles, adverbs, and object-based inheritance
> 
>    Inconsistency Robustness for Scalability in Interactive Concurrent‑Update
> In-Memory MOLAP Cubes
> 
> 
> _______________________________________________
> fonc mailing list
> [email protected]
> http://vpri.org/mailman/listinfo/fonc

_______________________________________________
fonc mailing list
[email protected]
http://vpri.org/mailman/listinfo/fonc

Re: [fonc] Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

Reply via email to