Re: API and terms: idle-time-out and heartbeat intervals.

Robbie Gemmell Wed, 28 Sep 2016 13:41:07 -0700

In case it was unclear I'm fine with it doing what it does here given
the spec wording, pessimistic was just a word choice (though I'd say
it fits the situation as well as conservative). I was mainly just
noting that as every time this comes up (this is at least the third
time I'd say I've had this discussion), essentially everyone notes
they dislike it sending twice as fast as it actually needs to if it is
proton/other-similarly-behaving-peer on the other end, then to avoid
quite as much chattyness it could be less conservative and run closer
than the current 50% of the peers advertised value.


That would be done on grounds that people should already be
advertising less than their actual timeout, but even if they werent we
would still be sending frames faster than they have requested, and
thus still hopefully avoid any spurious timeouts. Perhaps it could use
a higher percentage, but put a lower bound on the actual time
difference from the advertised value before it simply goes back to
using 50% instead. The main issue with is that as Rob outlined, the
amount of time needed to allow for it to succeed in being fully
processed as sent+received will vary depending on the situation,
making it not so simple to judge what would be appropriate.

Robbie

On 28 September 2016 at 15:13, Ken Giusti <kgiu...@redhat.com> wrote:
> I've had a hand in the way Proton/C interprets the meaning of 'idle-timeout' 
> and I've never liked the solution.  I think Proton/C's behavior is not 
> 'pessimistic' as much as it is 'conservative' for the sake of 
> interoperability.  This, unfortunately ends up with a needless idle frame 
> chattiness when both ends are Proton-based.
>
> ----- Original Message -----
>> From: "Rob Godfrey" <rob.j.godf...@gmail.com>
>> To: "qpid" <dev@qpid.apache.org>
>> Sent: Wednesday, September 28, 2016 6:19:05 AM
>> Subject: Re: API and terms: idle-time-out and heartbeat intervals.
>>
>> I agree that specifying that the communicated figure should be "half"
>> the "actual" timeout was a mistake.
>>
>> What the spec should have tried to communicate is that the sender
>> should communicate a value somewhat less than the period it uses to
>> determine that the connection has actually timed-out to allow for the
>> receiver to process and emit a heartbeat frame.
>
>
> Wouldn't it be much clearer to simply send the _actual_ idle timeout value?  
> Having the spec suggest "communicating a value *somewhat less*" [emphasis 
> mine] leaves the implementation open for interpretation - which is exactly 
> how we got into this mess in the first place.  Developers are a smart bunch - 
> they know that keep alive traffic will have to be sent frequently enough to 
> prevent idle timeout.
>
>
>>  Similarly the sender
>> should ensure that a frame has been emitted well within the timeout
>> period to allow for any communication / processing delay.
>
> Agreed - perfectly acceptable for the spec to point this out.
>
>>  In practice
>> these "wiggle room" factors should not be determined by the
>> application level timeout setting but by sensible calculations on
>> transport delay variance / processing time, etc...  these calculation
>> may differ between different use-cases / environments (for example in
>> a low latency / real-time environment you may be able to make hard
>> guarantees about the number of milliseconds that communication /
>> processing delay will take... on the other hand if you are using an
>> interpreted language with stop-the-world garbage collection you may
>> not be able to say much better than the delay should be less than 30s
>> or whatever).
>>
>
> Yes - very important things to keep in mind when implementing this.  But the 
> spec shouldn't be making these suggestions for different implementation 
> options. The spec should be as concise as possible about the mandated 
> behavior, and leave the implementation to the developers.
>
>> I think application level APIs should be in terms of the timeouts that
>> will affect the application.  The AMQP library should be massaging
>> those numbers in such a way that they can fulfil the application
>> requirements.
>>
>
> Agreed.  Now, is there _any_ way we can suggest an update to the spec?  
> Perhaps an errata, etc?
>
>> -- Rob
>>
>> On 28 September 2016 at 10:42, Robbie Gemmell <robbie.gemm...@gmail.com>
>> wrote:
>> > On 27 September 2016 at 22:24, Alan Conway <acon...@redhat.com> wrote:
>> >> On Tue, 2016-09-27 at 15:37 -0400, Alan Conway wrote:
>> >>> I want to clarify and document the meaning of these terms for our
>> >>> APIs,
>> >>> presently I can't find anywhere where they are documented clearly.
>> >>>
>> >>> The AMQP spec says: "Each peer has its own (independent) idle
>> >>> timeout.
>> >>> At connection open each peer communicates the maximum
>> >>> period between activity (frames) on the connection that it desires
>> >>> from
>> >>> its partner.The open frame carries the idletime-out
>> >>> field for this purpose. To avoid spurious timeouts, the value in
>> >>> idle-
>> >>> time-out SHOULD be half the peer’s
>> >>> actual timeout threshold."
>> >>>
>> >>> In other words: if I send you an "open" frame with idle-time-out=N
>> >>> that
>> >>> means *you* should not wait for longer than N milliseconds to send a
>> >>> frame to me. It does not mean *I* will close the connection after N
>> >>> milliseconds, I SHOULD be more patient and wait for N*2 ms to avoid
>> >>> closing prematurely due to minor timing wobbles.
>> >>>
>> >>> I think the choice of name is slightly ambiguous but the spec is
>> >>> clear
>> >>> on the semantics, so it's important to document it to remove the
>> >>> ambiguity.
>> >>>
>> >>> Anybody disagree?
>> >>>
>> >>
>> >> Sigh. Sadly proton-C interprets "idle-timeout" differently depending on
>> >> which end of the connection you are on:
>> >>
>> >>       // as per the recommendation in the spec, advertise half our
>> >>       // actual timeout to the remote
>> >>       const pn_millis_t idle_timeout = transport->local_idle_timeout
>> >>           ? (transport->local_idle_timeout/2)
>> >>           : 0;
>> >>
>> >> So in proton, pn_set_idle_timeout does NOT mean set the AMQP idle-
>> >> timeout value, it means set the local "receive timeout" value and send
>> >> half that as the AMQP "send timeout" for the peer.
>> >>
>> >> I'm tempted to use a new term in the Go API: "heartbeat". To me that
>> >> clearly means the "send timeout" (hearts beat, they don't listen for
>> >> beats) so it coincides with the meaning of the AMQP "idle-timeout", but
>> >> without the ambiguity that is exacerbated by proton interpreting it
>> >> both ways.
>> >>
>> >>
>> >
>> > Proton may seem to behave differently on each end, but I don't think
>> > its necessarily a bad thing that it does, and it is also I think
>> > largely just reflecting an annoying bit in the spec around this where
>> > different behaviours are allowed for, whereas it would be easier if it
>> > had less wiggle room.
>> >
>> > The transport setter/getter for the local timeout takes the 'actual
>> > timeout' and then sends half of it as the advertised value in the Open
>> > sent. This makes a certain amount of sense since it ensures that
>> > appropriate behaviour is actually satisfied, rather than expecting the
>> > user to ensure they only give half the value they really want for
>> > their actual timeout. The getter for the remote timeout value on the
>> > other hand returns the advertised value from the Open that is
>> > received. I expect it does that since it cant actually ever return the
>> > remotes 'actual timeout' without making an assumption, i.e that they
>> > did in fact advertise half (or less) of their actual timeout, which
>> > the spec only says that they SHOULD do.
>> >
>> > Yes the local setter taking the advertised value may have been better
>> > for method consistency with the remote getter. On the other hand,
>> > sending of necessary heartbeats is handled directly by the transport
>> > during the tick process, so users may not necessarily even use the
>> > getter themselves, and proton uses that remote value internally by
>> > pessimistically halfing it to account for the case that folks on the
>> > other end did not advertise half their actual timeout (since the spec
>> > doesnt require that they do). Side note: proton could arguably be less
>> > pessimistic here and go for say a percentage much nearer the full
>> > advertised value, but then you'd probably need to start guaging how
>> > close is too close.
>> >
>> > I think ensuring the doccumentation on the methods is clear what they
>> > do is sufficient enough here. I actually prefer idle-timeout as an
>> > name rather than heartbeat due to the way this all works. Since you
>> > only tell the other side [half] your timeout, you dont actually have
>> > direct control over when they send any needed empty frames to satisfy
>> > it (as the above shows, we might send them more often than they
>> > require) and 'heartbeat' might seem to imply that you do, and possibly
>> > even that they need be sent at that period all the time even despite
>> > regular traffic, which is not the case.
>> >
>> > Robbie
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
>> > For additional commands, e-mail: dev-h...@qpid.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
>> For additional commands, e-mail: dev-h...@qpid.apache.org
>>
>>
>
> --
> -K
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
> For additional commands, e-mail: dev-h...@qpid.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Re: API and terms: idle-time-out and heartbeat intervals.

Reply via email to