Re: API and terms: idle-time-out and heartbeat intervals.

Alan Conway Wed, 28 Sep 2016 13:25:56 -0700

Seems there's some dispute over the meaning of idle-time-out in the
AMQP spec, not just how proton treats it. Let me try to convince you.


Forget proton for a moment, here's the spec text with my comments.
Remember: the spec defines the AMQP *protocol*, the important text is
where it talks specifically about protocol fields and constructs.

The key problem is that "idle timeout" is ambiguous: Does it mean the
time the frame receiver will wait before closing, or the max time the
frame sender is allowed to wait before sending? I claim that while the
English words "idle timeout" are used sloppily in the spec, the
statements about the formal protocol value idle-time-out are clear:
idle-time-out is the frame interval, not the connection close
threshold.

====

[spec] Connections are subject to an idle timeout threshold. The
timeout is triggered by a local peer when no frames are received after
a threshold value is exceeded. The idle timeout is measured in
milliseconds, and starts from the time the last frame is received. If
the threshold is exceeded, then a peer SHOULD try to gracefully close
the connection using a close frame with an error explaining why. If the
remote peer does not respond gracefully within a threshold to this,
then the peer MAY close the TCP socket.

[ac] This paragraph is sloppy: it uses "idle timeout", "idle timeout
threshold" and "threshold" without defining any of them. However it has
not mentioned any formal AMQP constructs as yet, so I claim this is
non-normative discussion.

[spec] Each peer has its own (independent) idle timeout. At connection
open each peer communicates the maximum period between activity
(frames) on the connection that it desires from its partner.

[ac] This is clear: the *communicated* value (on the open frame) is the
maximum period *between frames*, it is *not* the time threshold before
a connection is closed.

[spec] The open frame carries the idle-time-out field for this purpose.
To avoid spurious timeouts, the value in idle-time-out SHOULD be half
the peer’s actual timeout threshold.

[ac] Here there is a clear distinction between the formal idle-time-out 
(frame interval) and the informal "timeout threshold" at which the
connection is closed. Since the idle-time-out SHOULD be half the
threshold, clearly it *is not* the threshold. It would have been
clearer to say "the connection close threshold should be twice the
idle-time-out"

[snip irrelevant stuff]

[spec] If during operation a peer exceeds the remote peer’s idle
timeout’s threshold, e.g., because it is heavily loaded, it SHOULD
gracefully close the connection by using a close frame with an error
explaining why

[ac] Sloppy again, but does not make any statement about the formal
meaning of idle-time-out. It talks about "thresholds" which refer to
the close threshold.

====

With one exception (in paragraph 1) the word "threshold" is always used
(sometimes in "idle timeout threshold") when referring to the
connection close threshold. I agree that the first and last paragraphs
are unclear, and the English words "idle timeout" are over-used in
confusing ways. However I think all the text that refers directly to
the formal idle-time-out value is clear: idle-time-out is the inter-
frame interval, NOT the connection close threshold.

On Wed, 2016-09-28 at 10:13 -0400, Ken Giusti wrote:
> I've had a hand in the way Proton/C interprets the meaning of 'idle-
> timeout' and I've never liked the solution.  I think Proton/C's
> behavior is not 'pessimistic' as much as it is 'conservative' for the
> sake of interoperability.  This, unfortunately ends up with a
> needless idle frame chattiness when both ends are Proton-based.
> 
> ----- Original Message -----
> > 
> > From: "Rob Godfrey" <[email protected]>
> > To: "qpid" <[email protected]>
> > Sent: Wednesday, September 28, 2016 6:19:05 AM
> > Subject: Re: API and terms: idle-time-out and heartbeat intervals.
> > 
> > I agree that specifying that the communicated figure should be
> > "half"
> > the "actual" timeout was a mistake.
> > 
> > What the spec should have tried to communicate is that the sender
> > should communicate a value somewhat less than the period it uses to
> > determine that the connection has actually timed-out to allow for
> > the
> > receiver to process and emit a heartbeat frame.
> 
> 
> Wouldn't it be much clearer to simply send the _actual_ idle timeout
> value?  Having the spec suggest "communicating a value *somewhat
> less*" [emphasis mine] leaves the implementation open for
> interpretation - which is exactly how we got into this mess in the
> first place.  Developers are a smart bunch - they know that keep
> alive traffic will have to be sent frequently enough to prevent idle
> timeout. 
> 
> 
> > 
> >  Similarly the sender
> > should ensure that a frame has been emitted well within the timeout
> > period to allow for any communication / processing delay.
> 
> Agreed - perfectly acceptable for the spec to point this out.
> 
> > 
> >  In practice
> > these "wiggle room" factors should not be determined by the
> > application level timeout setting but by sensible calculations on
> > transport delay variance / processing time, etc...  these
> > calculation
> > may differ between different use-cases / environments (for example
> > in
> > a low latency / real-time environment you may be able to make hard
> > guarantees about the number of milliseconds that communication /
> > processing delay will take... on the other hand if you are using an
> > interpreted language with stop-the-world garbage collection you may
> > not be able to say much better than the delay should be less than
> > 30s
> > or whatever).
> > 
> 
> Yes - very important things to keep in mind when implementing
> this.  But the spec shouldn't be making these suggestions for
> different implementation options. The spec should be as concise as
> possible about the mandated behavior, and leave the implementation to
> the developers.
> 
> > 
> > I think application level APIs should be in terms of the timeouts
> > that
> > will affect the application.  The AMQP library should be massaging
> > those numbers in such a way that they can fulfil the application
> > requirements.
> > 
> 
> Agreed.  Now, is there _any_ way we can suggest an update to the
> spec?  Perhaps an errata, etc?
> 
> > 
> > -- Rob
> > 
> > On 28 September 2016 at 10:42, Robbie Gemmell <robbie.gemmell@gmail
> > .com>
> > wrote:
> > > 
> > > On 27 September 2016 at 22:24, Alan Conway <[email protected]>
> > > wrote:
> > > > 
> > > > On Tue, 2016-09-27 at 15:37 -0400, Alan Conway wrote:
> > > > > 
> > > > > I want to clarify and document the meaning of these terms for
> > > > > our
> > > > > APIs,
> > > > > presently I can't find anywhere where they are documented
> > > > > clearly.
> > > > > 
> > > > > The AMQP spec says: "Each peer has its own (independent) idle
> > > > > timeout.
> > > > > At connection open each peer communicates the maximum
> > > > > period between activity (frames) on the connection that it
> > > > > desires
> > > > > from
> > > > > its partner.The open frame carries the idletime-out
> > > > > field for this purpose. To avoid spurious timeouts, the value
> > > > > in
> > > > > idle-
> > > > > time-out SHOULD be half the peer’s
> > > > > actual timeout threshold."
> > > > > 
> > > > > In other words: if I send you an "open" frame with idle-time-
> > > > > out=N
> > > > > that
> > > > > means *you* should not wait for longer than N milliseconds to
> > > > > send a
> > > > > frame to me. It does not mean *I* will close the connection
> > > > > after N
> > > > > milliseconds, I SHOULD be more patient and wait for N*2 ms to
> > > > > avoid
> > > > > closing prematurely due to minor timing wobbles.
> > > > > 
> > > > > I think the choice of name is slightly ambiguous but the spec
> > > > > is
> > > > > clear
> > > > > on the semantics, so it's important to document it to remove
> > > > > the
> > > > > ambiguity.
> > > > > 
> > > > > Anybody disagree?
> > > > > 
> > > > 
> > > > Sigh. Sadly proton-C interprets "idle-timeout" differently
> > > > depending on
> > > > which end of the connection you are on:
> > > > 
> > > >       // as per the recommendation in the spec, advertise half
> > > > our
> > > >       // actual timeout to the remote
> > > >       const pn_millis_t idle_timeout = transport-
> > > > >local_idle_timeout
> > > >           ? (transport->local_idle_timeout/2)
> > > >           : 0;
> > > > 
> > > > So in proton, pn_set_idle_timeout does NOT mean set the AMQP
> > > > idle-
> > > > timeout value, it means set the local "receive timeout" value
> > > > and send
> > > > half that as the AMQP "send timeout" for the peer.
> > > > 
> > > > I'm tempted to use a new term in the Go API: "heartbeat". To me
> > > > that
> > > > clearly means the "send timeout" (hearts beat, they don't
> > > > listen for
> > > > beats) so it coincides with the meaning of the AMQP "idle-
> > > > timeout", but
> > > > without the ambiguity that is exacerbated by proton
> > > > interpreting it
> > > > both ways.
> > > > 
> > > > 
> > > 
> > > Proton may seem to behave differently on each end, but I don't
> > > think
> > > its necessarily a bad thing that it does, and it is also I think
> > > largely just reflecting an annoying bit in the spec around this
> > > where
> > > different behaviours are allowed for, whereas it would be easier
> > > if it
> > > had less wiggle room.
> > > 
> > > The transport setter/getter for the local timeout takes the
> > > 'actual
> > > timeout' and then sends half of it as the advertised value in the
> > > Open
> > > sent. This makes a certain amount of sense since it ensures that
> > > appropriate behaviour is actually satisfied, rather than
> > > expecting the
> > > user to ensure they only give half the value they really want for
> > > their actual timeout. The getter for the remote timeout value on
> > > the
> > > other hand returns the advertised value from the Open that is
> > > received. I expect it does that since it cant actually ever
> > > return the
> > > remotes 'actual timeout' without making an assumption, i.e that
> > > they
> > > did in fact advertise half (or less) of their actual timeout,
> > > which
> > > the spec only says that they SHOULD do.
> > > 
> > > Yes the local setter taking the advertised value may have been
> > > better
> > > for method consistency with the remote getter. On the other hand,
> > > sending of necessary heartbeats is handled directly by the
> > > transport
> > > during the tick process, so users may not necessarily even use
> > > the
> > > getter themselves, and proton uses that remote value internally
> > > by
> > > pessimistically halfing it to account for the case that folks on
> > > the
> > > other end did not advertise half their actual timeout (since the
> > > spec
> > > doesnt require that they do). Side note: proton could arguably be
> > > less
> > > pessimistic here and go for say a percentage much nearer the full
> > > advertised value, but then you'd probably need to start guaging
> > > how
> > > close is too close.
> > > 
> > > I think ensuring the doccumentation on the methods is clear what
> > > they
> > > do is sufficient enough here. I actually prefer idle-timeout as
> > > an
> > > name rather than heartbeat due to the way this all works. Since
> > > you
> > > only tell the other side [half] your timeout, you dont actually
> > > have
> > > direct control over when they send any needed empty frames to
> > > satisfy
> > > it (as the above shows, we might send them more often than they
> > > require) and 'heartbeat' might seem to imply that you do, and
> > > possibly
> > > even that they need be sent at that period all the time even
> > > despite
> > > regular traffic, which is not the case.
> > > 
> > > Robbie
> > > 
> > > ---------------------------------------------------------------
> > > ------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > > 
> > 
> > -----------------------------------------------------------------
> > ----
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> > 
> > 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: API and terms: idle-time-out and heartbeat intervals.

Reply via email to