Hi there
 Kieran asked me to do further investigations considering the topic
"Deadlocked tcp_retransmit due to exceeded pcb->cwnd" (see
http://lists.gnu.org/archive/html/lwip-users/2008-07/msg00098.html).
[1]
 With some "segment loss emulation" code I was able to reproduce the
deadlock frequently.
 My summary first:
 It is not a high ACK loss problem. The congestion window test in
tcp_output() fails, cause the unacked queue gets misordered in the
situation I will further describe. This is a severe bug in my opinion
and I have no idea how to solve it (despite I wrote a workaround, but
that's not a properly styled solution). I could imagine that some
ghost like troubles of other lwip users are caused by this bug too.
 The "segment loss emulation"
 To gulp only one segment is not sufficient to reproduce the problem
frequently. Therefore I decided to gulp the first retransmission of
every 10th segment too. And here we are:
 /* begin code snippet */
 static void
 tcp_output_segment(struct tcp_seg *seg, struct tcp_pcb *pcb)
 {
     //... Some statements ...
     packets_sent++;
     if ((packets_sent % 10) == 0)
     {
       //we enter here on every 10th segment and gulp it
         //(we omit to call ip_output()
         //we keep the sequence number of the gulped segment
         kept_seqno = seg->tcphdr->seqno;
     }
     else if (kept_seqno == seg->tcphdr->seqno)
     {
         //if the gulped segment gets retransmitted the first
         //time we gulp it once again.
         kept_seqno = 0;
     }
     else {
       //in every other case we do normal output
       ip_output(seg->p, &(pcb->local_ip), &(pcb->remote_ip),
pcb->ttl, pcb->tos,
       IP_PROTO_TCP);
     }
     // ... Some statements ...
 }
 /* end of code snippet */
 And this is whats happening on the "ether". In my queue
representation I use the sequence number of the segments (not tcp_seg
pointers). The sequence numbers are given from the last traces I made
on our GPRS system.
 1. Segment 8720:10085 was the last acknowledged segment from our
gprs remote peer.
 2. Segments 10085 to 14183 get enqueued by the local application the
unsent queue is as follows:
    unsent->10085->11453->12818->14183.
 3. Segment 10085 should be the next "in-sequence" to be sent,
hovever the gulp
    mechanism of our local peer emulates a segment loss. The queues
are as follows:
    unsent->11453->12818->14183
    unacked->10085.
 4. Due to the available congestion window (cwnd) segment 11453 is
sent (not gulped).
    unsent->12818->14183
    unacked->10085->11453
 5. Due to the available congestion window (cwnd) segment 12818 is
sent (not gulped).
    Unsent->14183
    unacked->10085->11453->12818
 6. Due to the available congestion window (cwnd) segment 14183 is
sent (not gulped).
    Unsent->empty
    unacked->10085->11453->12818->14183
 7. Due to high round trip time in the gprs network we get the first
dupack for
    segment 10085 from our remote peer
 8. We get the second dupack for 10085
 9. We get the third dupack for 10085. According to RFC2581 we shall
start a fast retransmission now
 10. For fast retransmission tcp_process() calls tcp_receive() calls
tcp_rexmit() calls tcp_output()
 11. Cause tcp_output() was invoked by an initial tcp_input() it
bailes out on
     if (tcp_input_pcb == pcb)
     ==> !!! This violates RFC2581 IMHO !!!
 12. But tcp_rexmit() already tinkered our queues by placing the
first unacked segment to the
     unsent queue.
     unsent->10085
     unacked->11453->12818->14183
 13. The next few ouput attempts bail out in tcp_output() due to the
nagle algorith
     (tcp_do_output_nagle()). Thus nothing more hapens till a
retransmission timeout occurs
 14. tcp_slowtmr() requires a retransmission (pcb->rtime >=
pcb->rto). This shrinks down the
     congestion window to the maximum segment size (1390 in my case).
     BTW: A retransmission is triggered by segment 14183 and not by
10085 in this case
     which is an aftereffect of the underlying bug IMHO.
 15. tcp_slowtmr() calls tcp_rexmit_rto(). The rto function moves all
unacked segments to the head
     of the unsent queue. This is final step causing the deadlock in
tcp_output() cause the
     smallest sequence number is now at the end of the queue.
     Unsent->11453->12818->14183->10085.
 16. tcp_ouput() is finally called. Instead of retransmitting 10085
it retransmits 11453 but fails
     on (seg->tcphdr->seqno - pcb->lastack + seg->len > wnd) cause
     seg->tcphdr->seqno = 11453
     pcb->lastack = 10085
     seg->len = 12818-11453 = 1365
     wnd = 1390
     11453-10085+1365=2733 which is greater than 1365 and therefor
the test fails.
 17. from now on we have a deadlock cause the queue stays misordered
and tcp_output() always
     fails on this test.
 I needed a quick fix for our project and therefore I reordered the
queue in tcp_output before the
 While loop in tcp_output. However this is just a quick fix to fight
the symptoms. Therefore I ask
 for other suggestions or perhaps a patch.
 Remark: Perhaps this situation is hard to reproduce. Without the
"segment loss emulation" the
 Deadlock only occured by using very paricular memory (pbuf etc.)
configurations and by using the gprs
 Network with its high round trip delays and relays.
 /* begin of quick fix */
   tcp_reorder_segments(&seg);
   while ((seg != NULL) && (seg->tcphdr->seqno - pcb->lastack +
seg->len > wnd))
   {
      //.... Some statements ...//
   }
 void tcp_reorder_segments(struct tcp_seg **seg_ptr)
 {
     struct tcp_seg* left_seg;
     struct tcp_seg* right_seg;
     struct tcp_seg* head_seg;
     if (*seg_ptr == NULL)
     {
         return;
     }
     if ((*seg_ptr)->next == NULL)
     {
         return;
     }
     left_seg = *seg_ptr;
     head_seg = *seg_ptr;
     right_seg = (*seg_ptr)->next;
     while (right_seg != NULL)
     {
         if (right_seg->tcphdr->seqno < head_seg->tcphdr->seqno)
         {
             left_seg->next = right_seg->next;
             right_seg->next = head_seg;
             head_seg = right_seg;
             if (left_seg->next == NULL)
             {
                 break;
             }
         }
         else
         {
             left_seg = right_seg;
         }
         right_seg = left_seg->next;
     }
     *seg_ptr = head_seg;
 }
 /* end of quick fix */
 Kind regs
 Hans-Joerg Wagner B.Sc.EE / PGDip. SE
  

Links:
------
[1]
http://lists.gnu.org/archive/html/lwip-users/2008-07/msg00098.html).
_______________________________________________
lwip-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/lwip-users

Reply via email to