Re: [fonc] Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

David Barbour Fri, 13 Apr 2012 23:04:16 -0700

Another option would be to introduce slack in the propagation delay itself.
E.g. if I send you a message indicating the meeting has been moved to
1:30pm, it would be a good idea to send it a good bit in advance - perhaps
at 10:30am - so you can schedule and prepare.


With computers, the popular solution - sending a message *when* we want
something to happen - seems analogous to sending the message at 1:29:58pm,
leaving the computer always on the edge with little time to prepare even
for highly predictable events, and making the system much more vulnerable
to variations in communication latency.

On Fri, Apr 13, 2012 at 8:34 AM, David Goehrig <[email protected]> wrote:

> There's a very simple concept that most of the world embraces in
> everything from supply chain management, to personnel allocations, to
> personal relationships.
>
> We call it *slack*
>
> What Dan is talking about amounts to introducing slack into distributed
> models.  Particularly this version of the definition of slack:
>
> "*:* lacking in completeness, finish, or perfection <a very slack piece
> of work>"
>
> Which is a more realistic version of computation in a universe with
> propagation delay (finite speed of light). But it also introduces a concept
> similar to anyone familiar with ropes. You can't tie a knot without some
> slack. (computation being an exercise in binary sequence knot making).
> Finishing a computation is analogous to pulling the rope taunt.
>
> Dave
>
>
>
>
>
> -=-=- [email protected] -=-=-
>
> On Apr 13, 2012, at 5:53 AM, Eugen Leitl <[email protected]> wrote:
>
>
>
> http://highscalability.com/blog/2012/3/6/ask-for-forgiveness-programming-or-how-well-program-1000-cor.html
>
> Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
>
> Tuesday, March 6, 2012 at 9:15AM
>
> The argument for a massively multicore future is now familiar: while clock
> speeds have leveled off, device density is increasing, so the future is
> cheap
> chips with hundreds and thousands of cores. That’s the inexorable logic
> behind our multicore future.
>
> The unsolved question that lurks deep in the dark part of a programmer’s
> mind
> is: how on earth are we to program these things? For problems that aren’t
> embarrassingly parallel, we really have no idea. IBM Research’s David Ungar
> has an idea. And it’s radical in the extreme...
>
> Grace Hopper once advised “It's easier to ask for forgiveness than it is to
> get permission.” I wonder if she had any idea that her strategy for dealing
> with human bureaucracy would the same strategy David Ungar thinks will help
> us tame  the technological bureaucracy of 1000+ core systems?
>
> You may recognize David as the co-creator of the Self programming language,
> inspiration for the HotSpot technology in the JVM and the prototype model
> used by Javascript. He’s also the innovator behind using cartoon animation
> techniques to build user interfaces. Now he’s applying that same creative
> zeal to solving the multicore problem.
>
> During a talk on his research, Everything You Know (about Parallel
> Programming) Is Wrong! A Wild Screed about the Future, he called his
> approach
> “anti-lock or “race and repair” because the core idea is that the only way
> we’re going to be able to program the new multicore chips of the future is
> to
> sidestep Amdhal’s Law and program without serialization, without locks,
> embracing non-determinism. Without locks calculations will obviously be
> wrong, but correct answers can be approached over time using techniques
> like
> fresheners:
>
>    A thread that, instead of responding to user requests, repeatedly
> selects
> a cached value according to some strategy, and recomputes that value from
> its
> inputs, in case the value had been inconsistent. Experimentation with a
> prototype showed that on a 16-core system with a 50/50 split between
> workers
> and fresheners, fewer than 2% of the queries would return an answer that
> had
> been stale for at least eight mean query times. These results suggest that
> tolerance of inconsistency can be an effective strategy in circumventing
> Amdahl’s law.
>
> During his talk David mentioned that he’s trying  to find a better name
> than
> “anti-lock or “race and repair” for this line of thinking. Throwing my hat
> into the name game, I want to call it Ask For Forgiveness Programming
> (AFFP),
> based on the idea that using locks is “asking for permission” programming,
> so
> not using locks along with fresheners is really “asking for forgiveness.” I
> think it works, but it’s just a thought.
>
> No Shared Lock Goes Unpunished
>
> Amdahl’s Law is used to understand why simply having more cores won’t save
> us
> for a large class of problems. The idea is that any program is made up of a
> serial fraction and a parallel fraction. More cores only helps you with the
> parallel portion. If an operation takes 10 seconds, for example, and one
> second of it is serial, then having infinitely many cores will only help
> you
> make the parallelizable part faster, the serial code will always take  one
> second. Amdahl says you can never go faster than that 10%. As long as your
> code has a serial portion it’s impossible to go faster.
>
> Jakob Engblom recounts a similar line of thought in his blog:
>
>    They also had the opportunity to test their solution [for parallel
> Erlang] on a Tilera 64-core machines. This mercilessly exposed any
> scalability limitations in their system, and proved the conventional wisdom
> that going beyond 10+ cores is quite different from scaling from 1 to 8…
> The
> two key lessons they learned was that no shared lock goes unpunished, and
> data has to be distributed as well as code.
>
>    It seems that for all big system parallelization efforts turn into a
> hunt
> for locks and the splitting up of code and data into units that can run in
> parallel without having to synchronize. The “upper limit” of this process
> is
> clearly a system with no synchronization points at all.
>
> Sherlock Holmes says that when you have eliminated the impossible, whatever
> remains, however improbable, must be the truth, so the truth is: removing
> serialization is the only way to use all these cores. Since synchronization
> is the core serial component to applications, we must get rid of
> synchronization.
>
> A Transaction View Isn’t as Strong as it Once Was
>
> Getting rid of locks/synchronization may not be the radical notion it once
> was. Developments over the last few years have conditioned us to deal with
> more ambiguity, at least for distributed systems.
>
> Through many discussions about CAP, we’ve come to accept a
> non-transactional
> view of databases as a way to preserve availability in the face of
> partitions. Even if a read doesn’t return the last write, we know it will
> eventually and that some merge logic will make them consistent once again.
>
> The idea of compensating transactions as a way around the performance
> problems due to distributed coordination has also become more familiar.
>
> Another related idea comes from the realm of optimistic concurrency control
> that lets multiple transactions proceed in parallel without locking and
> then
> checks at commit time if a conflict requires a rollback. The application
> then
> gets to try again.
>
> And for some time now memcache has supported a compare-and-set operation
> that
> allows multiple clients to avoid writing over each other by comparing time
> stamps.
>
> As relaxed as these methods are, they all still require that old enemy:
> synchronization.
>
> Your Answers are Already Wrong
>
> The core difficulty with abandoning synchronization is coming to terms with
> the notion that results may not be correct at all times. It’s a certainty
> we
> love certainty, so even considering we could get wrong answers for any
> window
> of time is heresy.
>
> David says we need to change our way of thinking. The idea is to get
> results
> that are “good enough, soon enough.” Get wrong answers quickly, but that
> are
> still right enough to be useful. Perfect is the enemy of the good and
> perfect
> answers simply take too long at scale.
>
> David emphasises repeatedly that there’s a fundamental trade-off between
> correctness and performance. Without locks operations happen out of order,
> which gives the wrong answer. Race conditions happen. To get correct
> answers
> we effectively add in delays, making one thing wait for another, which
> kills
> performance, and we don’t want to kill performance, we want to use all
> those
> cores.
>
> But consider an aspect of working on distributed systems that people don’t
> like to think about: your answers are already always wrong.
>
> Unless you are querying a read-only corpus or use a global lock, in
> distributed systems any answer to any query is potentially wrong, always.
> This is the hardest idea to get into your brain when working in distributed
> systems with many cores that experience simultaneous updates. The world
> never
> stops. It’s always in flux. You can never assume for a single moment
> there’s
> a stable frame of reference. The answer from a query you just made could be
> wrong in the next instant. A query to see if a host is up, for example,
> can’t
> ever be assumed right. That host maybe up or down in the next instant and
> your code won’t know about it until it finds a conflict.
>
> So are we really that far away from accepting that all answers to queries
> could be wrong?
>
> Lessons from Nature
>
> One strategy for dealing with many cores is to move towards biological
> models
> instead of mathematical models, where complicated behaviours emerge without
> global determinism. Bird flocks, for example, emerge from three simple
> rules:
> avoid crowding, steer towards average heading of neighbors,  steer towards
> average position of neighbors. No pi-calculus required, it works without
> synchronization or coordination. Each bird is essentially its own thread,
> it
> simply looks around and makes local decisions. This is a more cellular
> automaton view of the world.
>
> Race and Repair - Mitigating Wrong Results
>
> The idea is that errors created by data races won’t be prevented, they will
> be repaired. A calculation made without locks will be wrong under
> concurrent
> updates. So why not use some of our many cores to calculate the right
> answers
> in the background, in parallel, and update the values? This approach:
>
>    Uses no synchronization.
>
>    Tolerates some wrong answers.
>
>    Probabilistically fixes the answers over time.
>
> Some obvious questions: how many background threads do you need? What order
> should values be recalculated? And how wrong will your answers be?
>
> To figure this out an experiment was run and described in Inconsistency
> Robustness for Scalability in Interactive Concurrent‑Update In-Memory MOLAP
> Cubes, which test updates on a complicated spreadsheet.
>
> With locking the results were correct, but scalability was limited. Without
> locking, results were usually wrong. Both results are as might be expected.
> And when they added freshener threads they found:
>
>    Combining the frequency with the duration data suggests that in a
> 16-core
> system with a 50/50 split between workers and fresheners, fewer than 2% of
> the queries would return an answer that had been stale for at least eight
> mean query times.
>
> I found this result quite surprising. I would have expected the answers to
> be
> wrong more of the time. Could this actually work?
>
> Breadcrumbs
>
> The simple idea of Race and Repair is open to a lot of clever innovations.
> Breadcrumbs are one such innovation that attempts to be smarter about which
> values need recalculating. Meta-data is attached on a value indicating
> that a
> entity is recalculating the value or has changed a dependent value such
> that
> this value is now out of date. Any entity that might want to use this data
> can wait until a “valid” value is calculated and/or not to insert a
> calculated value if it is out of data. This narrows the window of time in
> which errors are introduced. It’s a mitigation.
>
> There are endless variations of this. I can imagine remembering
> calculations
> that used a value and then publishing updated values so those calculations
> can be rerun. The result is a roiling  event driven sea that is constantly
> bubbling with updates that are trying to bring values towards correctness,
> but probably never quite getting there.
>
> Probabilistic Data Structures
>
> Another area David has researched are hashtables that can be inserted into
> without synchronization. This would allow entries to be added to a
> hashtable
> from any number of cores without slowing down the entire system with a
> lock.
> Naive insertion into a hashtable in parallel will mess up pointers, which
> can
> either result in the loss of values or the insertion of values, but it’s
> possible to work around these issues and create a lock free hashtable.
>
> His presentation goes into a lot of detail on how this might work. He
> rejects
> light-weight locking schmes like CAS because these are still a choke point
> and there’s a penalty for atomic instructions under contention. They won’t
> scale.
>
> He thinks there’s a big research opportunity in probabilistic data
> structures
> that work without synchronization and that work with mitigation.
>
> Is this the Future?
>
> This is just a light weight introduction. For more details please read all
> the papers and watch all the videos. But I think it’s important to talk
> about
> and think about how we might make use of all these cores for traditional
> programming tasks, though the result may be anything but traditional.
>
> A bad experience on one project makes me somewhat skeptical that human
> nature
> will ever be comfortable in accepting wrong answers as the norm. My
> experience report was on an event driven system with a large number of
> nodes
> that could generate events so fast that events had to be dropped. Imagine a
> city undergoing an earthquake and a huge sensor net spewing out change
> events
> to tell the world what’s going on in real-time.
>
> The original requirement was events could never be dropped, ever, which
> made
> a certain amount of sense when the number of sensors was small. As the
> number
> of sensors expands it’s simply not possible. My proposal was to drop events
> and have background processes query the actual sensors in the background so
> that the database would be synced over time. Very much like the proposals
> here.
>
> It was a no go. A vehement no go. Titanic arguments ensued.  Management and
> everyone on down simply could not accept the idea that their models would
> be
> out of sync with the sensors (which of course they were anyway). My
> eventual
> solution took a year to implement and radically changed everything, but
> that
> was simpler than trying to convince people to deal with uncertainty.
>
> So there might be some convincing to do.
>
> Related Articles
>
>    Everything You Know (About Parallel Programming) Is Wrong!: A Wild
> Screed
> About the Future (slides, video)
>
>    Renaissance Project
>
>    On Hacker News
>
>    David Ungar: It is Good to be Wrong
>
>    We Really Don't Know How To Compute!
>
>    Harnessing emergence for manycore programming: early experience
> integrating ensembles, adverbs, and object-based inheritance
>
>    Inconsistency Robustness for Scalability in Interactive
> Concurrent‑Update
> In-Memory MOLAP Cubes
>
>
> _______________________________________________
> fonc mailing list
> [email protected]
> http://vpri.org/mailman/listinfo/fonc
>
>
> _______________________________________________
> fonc mailing list
> [email protected]
> http://vpri.org/mailman/listinfo/fonc
>
>


-- 
bringing s-words to a pen fight

_______________________________________________
fonc mailing list
[email protected]
http://vpri.org/mailman/listinfo/fonc

Re: [fonc] Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

Reply via email to