[Linux-IrDA]Re: [patch] IrLAP disconnect timeout fix

Jean Tourrilhes Fri, 11 May 2001 15:42:03 -0700

On Wed, Apr 18, 2001 at 05:16:07AM -0700, Dag Brattli wrote:
> Hi,

        Hi,

        Sorry for the delay, I was busy on other things, and I need to
take my time for those stuff.

> I've been looking for a bug which sometimes makes IrLAP use a very
> long time to disconnect the link if the remote device is suddenly
> removed. When talking to my mobile phone which has max turn time of
> 500 ms for tx and 100 ms for rx, it used 5 * 12 secs to disconnect
> the link instead of just 12 secs.
> 
> I found a problem in irlap.c where N1 and N2 where calculated from
> qos_tx, but the final timer where calculated from qos_rx. It's the
> final timer which ticks the retry_count in irlap_event.c, so I think
> that the following patch should be correct. It fixes the problem for
> me, so please try it and check that I haven't overlooked something.
> 
> -- Dag

        Yes you did ;-)
        The final timer is calculated from qos_rx. So, N1 and N2
should be as well. However, N1 and N2 applies also to the wd
timer. And the wd timer is calculated from the qos_tx.
        I spend a bit of time reading the spec and the code. My
conclusion is that your fix is correct, but wd_timer should also be
calculated from the qos_rx.
        As all of this is a bit tricky, I've added some lengthly
explanations so that people can understand the logic. I've also put
the timer before the retry counts (N1/N2) so that it's more logical
(N1/N2 depend on the timer value and not the other way round).

        So, I did a bit of patch. This is against 2.4.4-pre1 which
already include your previous patch. See attached.

        Also, I've fixed two other problems. First one is obvious, if
saddr is DEV_ADDR_ANY we should pick any link layer. I'm a bit
confused about conventions and the address space.
        The spec says only that it's a random 32 bits number without
reserving any values. If you look *very* carefully in details,
5.7.1.4.1.4.1.1 and 5.7.1.4.1.4.1.2 defines FFFFFFFF as a broadcast
address and 00000000 as reserved.
        Most of the stack uses DEV_ADDR_ANY to define "any" (any
destination, any link). However, in this instance, it was 0 that was
used for that. A bit confusing.

        The second one is a recurring bug... It's tricky one...
        Let's assume self->qos_rx.max_turn_time.value = 200ms. So, we
have self->N1 = 15.
        Now, when is the test ((self->retry_count * 2) == self->N1)
true ? Never !!! Ouch !
        On the other hand, (self->retry_count == (self->N1 / 2)) is
true only once, when self->retry_count = 7...

        Now, last thing...
        I sent two patches to workaround the "socket stuck in
setup_pend state". You have put the wrong one in the kernel.
        This is a case of "it should never happen". Looking at the
state machines, a socket can't be stuck in this state. Just
impossible. However, I do see it happening. I don't know if it's a
race conditions due to the USB dongle, a genuine Linux bug or a
problem in the spec.
        Anyway, the fix you have in doesn't work. It does nothing. The
CONNECT_CMD at the LMP layer is not retried, but sent only
once. Therefore, you will never get a second
LM_CONNECT_INDICATION. And, as the socket is no longer in the
unconnected pool, it other sockets won't ever try to connect to
it. So, it stays dead.
        The other fix use the watchdog. It's less elegant, but does
work. Of course, the spec doesn't mention it, but it still
correct... This second patch was called ir242_conn_pend_stuck_2.diff.

        Finally, I need to look in my archive, but it seems that a lot
of my patches didn't made it in the kernel. Not mentionning the new
version of the USB driver.
        Bah...

        Have fun...

        Jean

diff -u -p linux/net/irda/irlmp.d8.c linux/net/irda/irlmp.c
--- linux/net/irda/irlmp.d8.c   Thu May 10 18:47:54 2001
+++ linux/net/irda/irlmp.c      Thu May 10 18:57:08 2001
@@ -376,7 +376,7 @@ int irlmp_connect_request(struct lsap_cb
         * discovery log and check if any of the links has discovered a
         * device with the given daddr 
         */
-       if (!saddr) {
+       if ((!saddr) || (saddr == DEV_ADDR_ANY)) {
                if (daddr != DEV_ADDR_ANY)
                        discovery = hashbin_find(irlmp->cachelog, daddr, NULL);
                else {
diff -u -p linux/net/irda/irlap.d8.c linux/net/irda/irlap.c
--- linux/net/irda/irlap.d8.c   Fri May 11 13:56:08 2001
+++ linux/net/irda/irlap.c      Fri May 11 14:55:18 2001
@@ -1085,42 +1085,63 @@ void irlap_apply_connection_parameters(s
        self->line_capacity = 
                irlap_max_line_capacity(self->qos_tx.baud_rate.value,
                                        self->qos_tx.max_turn_time.value);
+
+       /* 
+        *  Initialize timeout values, some of the rules are listed on 
+        *  page 92 in IrLAP.
+        */
+       ASSERT(self->qos_tx.max_turn_time.value != 0, return;);
+       ASSERT(self->qos_rx.max_turn_time.value != 0, return;);
+       /* The poll timeout applies only to the primary station.
+        * It defines the maximum time the primary stay in XMIT mode
+        * before timeout and turning the link around (sending a RR).
+        * Or, this is how much we can keep the pf bit in primary mode.
+        * Therefore, it must be lower or equal than our *OWN* max turn around.
+        * Jean II */
+       self->poll_timeout = self->qos_tx.max_turn_time.value * HZ / 1000;
+       /* The Final timeout applies only to the primary station.
+        * It defines the maximum time the primary wait (mostly in RECV mode)
+        * for an answer from the secondary station before polling it again.
+        * Therefore, it must be greater or equal than our *PARTNER*
+        * max turn around time - Jean II */
+       self->final_timeout = self->qos_rx.max_turn_time.value * HZ / 1000;
+       /* The Watchdog Bit timeout applies only to the secondary station.
+        * It defines the maximum time the secondary wait (mostly in RECV mode)
+        * for poll from the primary station before getting annoyed.
+        * Therefore, it must be greater or equal than our *PARTNER*
+        * max turn around time - Jean II */
+       self->wd_timeout = self->final_timeout * 2;
+
+       /*
+        * N1 and N2 are maximum retry count for *both* the final timer
+        * and the wd timer (with a factor 2) as defined above.
+        * After N1 retry of a timer, we give a warning to the user.
+        * After N2 retry, we consider the link dead and disconnect it.
+        * Jean II
+        */
+
        /*
         *  Set N1 to 0 if Link Disconnect/Threshold Time = 3 and set it to 
         *  3 seconds otherwise. See page 71 in IrLAP for more details.
         *  TODO: these values should be calculated from the final timer
-         *  as well
+         *  as well => They seem to be, what's the beef ? Jean II
         */
-       ASSERT(self->qos_tx.max_turn_time.value != 0, return;);
        if (self->qos_tx.link_disc_time.value == 3)
                /* 
                 * If we set N1 to 0, it will trigger immediately, which is
                 * not what we want. What we really want is to disable it,
                 * Jean II 
                 */
-               self->N1 = -1; /* Disable */
+               self->N1 = -2;  /* Disable - need to be multiple of 2 */
        else
                self->N1 = 3000 / self->qos_rx.max_turn_time.value;
        
        IRDA_DEBUG(4, "Setting N1 = %d\n", self->N1);
        
-       
+       /* Set N2 to match our *OWN* disconnect time */
        self->N2 = self->qos_tx.link_disc_time.value * 1000 / 
                self->qos_rx.max_turn_time.value;
        IRDA_DEBUG(4, "Setting N2 = %d\n", self->N2);
-
-       /* 
-        *  Initialize timeout values, some of the rules are listed on 
-        *  page 92 in IrLAP.
-        */
-       self->poll_timeout = self->qos_tx.max_turn_time.value * HZ / 1000;
-       self->wd_timeout = self->poll_timeout * 2;
-
-       /* 
-        * Be careful to keep our promises to the peer device about how long
-        * time it can keep the pf bit. So here we must use the rx_qos value
-        */
-       self->final_timeout = self->qos_rx.max_turn_time.value * HZ / 1000;
 
 #ifdef CONFIG_IRDA_COMPRESSION
        if (self->qos_tx.compression.value) {
diff -u -p linux/net/irda/irlap_event.d8.c linux/net/irda/irlap_event.c
--- linux/net/irda/irlap_event.d8.c     Fri May 11 13:56:27 2001
+++ linux/net/irda/irlap_event.c        Fri May 11 14:51:46 2001
@@ -1927,24 +1927,23 @@ static int irlap_state_nrm_s(struct irla
                 *  Wait until retry_count * n matches negotiated threshold/
                 *  disconnect time (note 2 in IrLAP p. 82)
                 *
-                * Note : self->wd_timeout = (self->poll_timeout * 2),
-                *   and self->final_timeout == self->poll_timeout,
-                *   which explain why we use (self->retry_count * 2) here !!!
+                * Note : self->wd_timeout = (self->final_timeout * 2),
+                *   which explain why we use (self->N2 / 2) here !!!
                 * Jean II
                 */
                IRDA_DEBUG(1, __FUNCTION__ "(), retry_count = %d\n", 
                           self->retry_count);
 
-               if (((self->retry_count * 2) < self->N2)  && 
-                   ((self->retry_count * 2) != self->N1)) {
+               if ((self->retry_count <  (self->N2 / 2))  && 
+                   (self->retry_count != (self->N1 / 2))) {
                        
                        irlap_start_wd_timer(self, self->wd_timeout);
                        self->retry_count++;
-               } else if ((self->retry_count * 2) == self->N1) {
+               } else if (self->retry_count == (self->N1 / 2)) {
                        irlap_status_indication(self, STATUS_NO_ACTIVITY);
                        irlap_start_wd_timer(self, self->wd_timeout);
                        self->retry_count++;
-               } else if ((self->retry_count * 2) >= self->N2) {
+               } else if (self->retry_count >= (self->N2 / 2)) {
                        irlap_apply_default_connection_parameters(self);
                        
                        /* Always switch state before calling upper layers */

[Linux-IrDA]Re: [patch] IrLAP disconnect timeout fix

Reply via email to