Re: [concurrency-interest] Spin Loop Hint support: Draft JEP proposal

Gil Tene Thu, 29 Oct 2015 00:53:59 -0700

[Sorry for the 4 day delay in response. JavaOne sort of got in the way]

I think we are looking at two separate and almost opposite motivations, each of 
which is potentially independently valid. Each can be characterized by 
answering the question: "How does adding this to an empty while(!ready) {} spin 
loop change things?".

Putting name selection aside, one motivation can be characterized with "if I 
add this to a spinning loop, keep spinning hard and don't relinquish resources 
any more than the empty loop would, but try to leave the spin as fast as 
possible. And it would be nice if power was conserved as a side effect.". The 
other motivation can be characterized with "If I add this to a spin loop, I am 
indicating that I can't make useful progress unless stuff happens or some 
internal time limit is reached, and that it is ok to try and make better use of 
resources (including my CPU), relinquishing them more aggressively than the 
empty loop would. And it would be nice if reaction time was faster most of the 
time too". 

The two motivations are diametrically opposed in their expected effect when 
compared to the behavior of an empty spin loop that does not contain them. Both 
can be validly implemented as a nop, but they "hint" in opposite directions. 
The former is what I have been calling a spin loop hint (in the "keep spinning 
and don't let go" sense), and the latter is a "spin/yield" (in the "ok to let 
go" sense). They have different uses.

> On Oct 24, 2015, at 11:09 AM, Doug Lea <[email protected]> wrote:
> 
> 
> Here's one more attempt to explain why it would be a good idea
> to place, name, and specify this method in a way that is more
> general than "call this method only if you want a PAUSE instruction
> on a dedicated multicore x86":

I agree with the goal of not aiming at a processor specific behavior, and 
focusing on documenting intent and expectation. But I think that the intent 
suggested in the spinLoopHint() JavaDoc does that. As noted later in this 
e-mail, there are other things that the JVM can choose to do to work in the 
hint's intended direction.

> 
> On 10/15/2015 01:23 PM, Gil Tene wrote:
> ...
>> 
>> As noted in my proposed JavaDoc, I see the primary indication of the hint to
>> be that the reaction time to events that would cause the loop to exit (e.g.
>> in nanosecond units) is more important to the caller than the speed at which
>> the loop is executing (e.g. in "number of loop iterations per second" units).
> 
> Sure. This can also be stated:
> 
> class Thread { ...
> /**
>  * A hint to the platform that the current thread is momentarily
>  * unable to progress until the occurrence of one or more actions of
>  * one or more other threads (or that its containing loop is
>  * otherwise terminated).  The method is mainly applicable in
>  * spin-then-block constructions entailing a bounded number of
>  * re-checks of a condition, separated by spinYield(), followed if
>  * necessary with use of a blocking synchronization mechanism.  A
>  * spin-loop that invokes this method on each iteration is likely to
>  * be more responsive than it would otherwise be.
>  */
>  public static void spinYield();
> }

I like the "more responsive than it would otherwise be" part. That certainly 
describes how this is different than an empty loop. But the choice of "mainly 
applicable" in spinYield() is exactly opposite from the main use case 
spinLoopHint() is intended for (which is somewhere between "indefinite 
spinning" and "I don't care what kind of spinning"). This JavaDoc looks like a 
good description of spinYield() and it's intended main use cases, but this 
stated intent and expectations (when compared to just doing an empty spin loop) 
works in the opposite direction of what spinLoopHint's intent and expectations 
need to be for it's common use cases.

> 
>> Anyone running indefinite spin loops on a uniprocessor deserves whatever they
>> get. Yielding in order to help them out is not mercy. Let Darwin take care of
>> them instead.
>> 
>> But indefinite user-mode spinning on many-core systems is a valid and common
>> use case (see the disruptor link in my previous e-mail).
> 
>> In such situations the spinning loop should just be calling yield(), or
>> looping for a very short count (like your magic 64) and then yielding. A
>> "magically choose for me whether reaction time or throughput or being nice to
>> others is more important" call is not a useful hint IMO.
>> 
>> Like in my uniprocessor comment above, any program spinning indefinitely (or
>> for a non-trivial amount of time) with load > # cpus deserves what it gets.
> 
> The main problem here is that there are no APIs reporting whether
> load > # cpus, and no good prospects for them either, especially
> considering the use of hypervisors (that may intentionally mis-report)
> and tightly packed cloud nodes where the number of cpus currently
> available to a program may depend on random transient effects of
> co-placement with other services running on that node.

Since a simple empty spinning loop ( while(!ready){} ) is valid, even if/when 
stupid, on any such setup, I don't see how a hint needs to carry a higher 
burden of being able to know these things. Such empty loops are already being 
used in both indefinite and backing-off spinning situations, along with the 
risks, responsibilities, and sensitivities that performing such spinning carry. 
It is hard to argue against the obvious and very real benefits that indefinite 
spinning loops provide on well provisioned many-core systems, in terms of 
latency behavior and reaction time when compared with back-off variants. Yes, 
they come with extra risks of performance degradation when control is lacking, 
but they are so useful that their existence proof probably trumps the "people 
shouldn't do this" argument.

So lets look at what each call would do compared to just having an empty loop: 
The starting point of a pure empty loop obviously does not imply or hint that 
the JVM should take extra steps to yield resources in the loop. The 
JVM/OS/Hypervisor certainly MAY do that, but there is no declaration of this 
intent, and probably no expectation that such yielding would be more likely in 
the loop than anywhere else in the code.

In the case where a spin hint is added:

while(!ready){ spinLoopHint(); };

The *only* intent declared [in my suggested JavaDoc for the hint] (above the 
empty loop implementation) is the wish improve the speed of reacting to "ready" 
becoming true, and the willingness to sacrifice the "speed" of iteration 
(number of times/sec around the loop) in service of that wish. This would be 
the common case in indefinite spinning situations that are prevalent in 
many-core latency sensitive stacks today. [e.g. I would expect 
https://github.com/LMAX-Exchange/disruptor/blob/f29b3148c2eef3aa2dc5d5f570d7dde92b2f98ba/src/main/java/com/lmax/disruptor/BusySpinWaitStrategy.java#L28
 to elect to use the spinLoopHint() ]. It does not harm, and can only help the 
cause.

[Note that there is currently no way to achieve this hint in Javadom, leaving 
such busy spinning strategies written in Java at a disadvantage when compared 
to their C cousins executing on identical platforms. That's the gap that this 
proposed spinLoopHint() JEP is intended to close.]

In contrast:

while(!ready){ spinYield(); };

Would declare (per the JavaDoc suggested for spinYield()) a very different 
intent: I.e. an intent to spin but eventually back off, and a wish to 
relinquish resources (including the cpu itself) more aggressively than the 
empty loop would.

While I can certainly envision new implementations that may want to use such a 
call, it would be useful to try and find actual places where this call would be 
made in current use cases. Since most of the the desired effect can be achieved 
in current Java, there are already multiple implementations of non-indefinite 
spinning out there, and looking at them in this context may be useful.

Having done a cursory scan of a few such loops, I suspect that many current 
spin-then-backoff implementations are likely to avoid using such a fuzzy 
implementation because they would normally desire more control over the backoff 
logic. E.g. various "non-busy" WaitStrategy variants (see implementations of 
WaitStrategy found here: 
https://github.com/LMAX-Exchange/disruptor/tree/f29b3148c2eef3aa2dc5d5f570d7dde92b2f98ba/src/main/java/com/lmax/disruptor
 ) make specific choices about how to not busily-spin. Specific and current 
non-busy-spinning implementation variants include yielding, blocking, sleeping, 
blocking with a timeout, "lite" blocking, and a phased backoff strategy. I 
would expect that none of those would make use of the suggested spinYield() 
because each is making a different choice about backoff behavior. However, 
several of them (e.g. YieldingWaitStrategy and PhasedBackoffWaitStrategy ] 
would probably make use of spinLoopHint() [in the spinning parts that have not 
yet decided to back off], even though they don't spin indefinitely. [here too, 
spinLoopHint does no harm, and can only help their cause].

It does feel like letting the JVM (and underlying platform) implement spinning 
logic may be more desirable and portable for some things than specific strategy 
implementations written in Java, but evolving the proper API for such 
spins-with-backoff will probably entail studying the various things that may be 
expected of them (the richness of the Disruptor's not-entirely-busy strategies 
alone suggests that there is a need to indicate intent more clearly and richly 
than a single no-args call can do). I would submit that developing this API is 
orthogonal to the intend and purpose of the proposed spinLoopHint() JEP, and 
that we should work on it separately.

> And given that programmers cannot portably comply, the method must
> allow implementations that take the best course of action known to the JVM.

I agree, but in the sense of "best course of action for achieving the implied 
intent compared to an empty spin loop with no hint". We agree that a nop 
implementation is valid. Things that "do more than a nop" should strive to move 
the behavior in the indicated direction compared to that. Agreeing on what that 
direction is for each call is key.

There is a lot more than PAUSE that can be done in a spinLoopHint(), BTW. E.g. 
since a spinLoopHint() (in my suggested JavaDoc intent) indicates higher 
responsiveness as a priority, it would be valid and useful (but in no way 
required) for the JVM to work to *reduce* the likelihood of yielding the CPU in 
a loop that contained the hint. E.g. if there was some way for the JVM to 
communicate this preference to the underlying scheduling levels (OS, 
hypervisor, and even BIOS and HW power management), that would work to improve 
the behavior in the desired direction. I can envision interesting choices 
around isolcpus, tasksets, and weight decisions in cpu load balancing 
decisions, or even priorities. But I really have no desire to implement any of 
those at this time…

> Despite all of the above, I agree that an OK initial hotspot implementation
> is just to emit PAUSE if on x86 else no-op. It might be worth then
> experimenting with randomized branching etc on other platforms, and
> someday further exploring some cheap form of load detection, perhaps
> kicking in only upon repeated invocation.
> 
> -Doug
>

Re: [concurrency-interest] Spin Loop Hint support: Draft JEP proposal

Reply via email to