Hello,

1. New findings:
  if the kernel runs on a single CPU machine, everything is ok. I transferred 
the 2MB testfile three times and it has been always received ok.
  => I suppose it's an SMP issue.  

2. The affected system is debian wheezy with standard kernel 
   Linux db0fhn 3.2.0-4-amd64 #1 SMP Debian 3.2.73-2+deb7u2 x86_64 GNU/Linux

vy 73,
        - Thomas  dl9sau


On Fri, Feb 12, 2016 at 07:12:12PM +0100, Thomas Osterried wrote:
> Hello,
> 
> this is a bug report for the kernel ax25 outqueue:
> 
> - some packets with different content are transmitted with the same AX.25 
> sequence-number.
> - some data is sent twice; we also observe out-of-order data
> 
> I generated a file with testpatterns, which consists of a line number 
> (aligned to 256 bytes) in the form "007811 deadbeefde...\n". The file size is 
> 2MB.
> 
> I downloaded the file via an AX.25 connection from db0fhn's linux to db0fhn's 
> linux via db0fhn's xnet ( call -r -s dl9sau bpq1 db0fhn-9 db0fhn ).
> 
> I traced the packets with listen -a and stored the trace (for being able to 
> see what happened).
> 
> The files differ (/tmp/testfile.raw is the original file,  
> /var/ax25/testfile.raw is the file downloaded with call):
> diff /tmp/testfile.raw /var/ax25/testfile.raw
> < 000699 
> deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdea
> dbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
> fdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> ...
> 
> 
> # grep 000699 /tmp/testfile.raw /var/ax25/testfile.raw
> /tmp/testfile.raw:000699 
> deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdea
> dbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
> fdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 
> => Line number 000699 was not received.
> 
> 
> 
> listen -a shows the packets from linux to the xnet program:
> 
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I50^ pid=F0(Text) len 256
> 0000  000697 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I51^ pid=F0(Text) len 256
> 0000  000698 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I51^ pid=F0(Text) len 256
> 0000  000699 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I52^ pid=F0(Text) len 256
> 0000  000700 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> ..
> 
> You see the different content (000698 und 00069) in the SAME AX.25 
> sequenze-Number (I51)?
> 
> 
> This came back from the xnet digi:
> 
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I46^ pid=F0(Text) len 256
> 0000  000697 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I47^ pid=F0(Text) len 256
> 0000  000698 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I40^ pid=F0(Text) len 256
> 0000  000700 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I41^ pid=F0(Text) len 256
> 0000  000702 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 
> => Packet 000699 is missing.
> 
> xnet did not complain about the packets with the different data and same 
> sequence number (and silently dropped the second packet).
> 
> 
> A verification if data corruption has happened (loss of characters):
> 
> 
> root@db0fhn:/home/thomas# grep ^0 /var/ax25/testfile.raw  | awk '{print $2}' 
> |sort -u
> deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> root@db0fhn:/home/thomas# grep ^0 /tmp/testfile.raw  | awk '{print $2}' |sort 
> -u
> deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> root@db0fhn:/home/thomas# 
> 
> => We do not have shortened lines or wrong characters.
> 
> 
> 
> 
> The losses of our user content lines occur blockwise:
> 
> Looking for missing "lines" in the download (because it had the same sequence 
> number than its predecessor):
> 
> $ for i in $(grep ^0 /tmp/testfile.raw |awk '{print $1}'); do grep -q "^$i " 
> /var/ax25/testfile.raw || echo $i ; done
> 
> [When an error occurs, it mostly occurs more times again, and then becomes 
> healthy again. I make it more readable by writing empty lines after a 
> "block"]:
> 
> 000699
> 000701
> 000703
> 000705
> 
> 001230
> 001232
> 001234
> 001236
> 001238
> 001240
> 001242
> 
> 001783
> 001785
> 
> 002160
> 002162
> 002164
> 002166
> 002168
> 002170
> 002172
> 
> 002952
> 002954
> 002956
> 002958
> 002960
> 002962
> 002964
> 
> 003336
> 003338
> 003340
> 003342
> 003344
> 003346
> 
> 003890
> 003892
> 003894
> 003896
> 
> 004815
> 004817
> 004819
> 004821
> 004823
> 004825
> 004827
> 
> 005901
> 005903
> 005905
> 005907
> 005909
> 005911
> 
> 006091
> 006093
> 006095
> 006097
> 006099
> 006101
> 006103
> 
> 
> The testfile ends with line number 007812.
> 
> 
> One thing is very obvious: after one affected packet, the next one is ok 
> again. Most of the time, this repeats, up to 7 times.
> The largest blocks contain 7 lost lines, which means, that out of 14 packets 
> every second packet was sent with the same sequence-number.
> 
> 
> 
> We also had out-of-order-receiption. line numbers, where the previous packet 
> had a higher line number than the current:
> 
> 
> $ LAST=0; for i in $(grep ^0 /var/ax25/testfile.raw |awk '{print $1}'); do if 
> [ $LAST -gt $i ]; then echo $LAST $i ;fi ; LAST=$i  ; done|less
> 000891 000862
> 001789 001787
> 001799 001787
> 002179 002173
> 002971 002965
> 003909 003897
> $ 
> 
> In the example of 001789 we look at the trace file of listen -a:
> 
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I50^ pid=F0(Text) len 256 
> 0000  001786 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> 
> ^ 001787 is missing
> v
> 
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I51^ pid=F0(Text) len 256 
> 0000  001788 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I52+ pid=F0(Text) len 256 
> 0000  001789 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I53+ pid=F0(Text) len 256 
> 0000  001787 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> 
> ^ here is 001787, which comes after 1789.
> v [be aware of the next packet for line 001790 has the same sequence-number 
> as the previous-previous-packet]
> 
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I52^ pid=F0(Text) len 256 
> 0000  001790 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I53+ pid=F0(Text) len 256 
> 0000  001791 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 
> 
> Unfortunately, it's not enough. Line 001787 was received 2 times: shown here 
> with context +/- one line:
> 
> $ grep -B1 -A1 001787 /var/ax25/testfile.raw 
> 001789 
> deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 001787 
> deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 001791 
> deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> --
> 001799 
> deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 001787 
> deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 001790 
> deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 
> Furthermore, the second occurence is completely out of order ( 001799 , then 
> 001787 a second time, then 001790 ).
> 
> 
> 
> ===========
> 
> Consistency verify of my test environment: axspawn works ok:
> 
> axspwan reads from pty of his child (login->shell->bget) and writes the 
> output to the kernel
> 
> strace -p 8312 -e write -s 2048 2> /tmp/axspawn-write-trace
> 
> Original file:
> $ wc /tmp/testfile.raw 
>    7813   15626 2000128 /tmp/testfile.raw
> 
> $ grep 'write(1, "0' /tmp/axspawn-write-trace |grep -v "= -1 EAGAIN (" |cut 
> -d'"' -f2-|awk '{print $1}' |wc
>    7813    7813   54691
> 
> => every line of the file testfile.raw was written to the kernel.
> 
> Every line is a direct successor of the previous line:
> $ p=-1; for i in $(grep 'write(1, "0' /tmp/axspawn-write-trace |grep -v "= -1 
> EAGAIN (" |cut -d'"' -f2-|awk '{print $1}'); do if [ $(echo $p +1 -$i |bc) != 
> 0 ] ; then echo $i $p ;fi; p=$i ; done
> $ 
> 
> 76 times an -1 EAGAIN occured and the line was written again (because the 
> kernel buffers were full) until it succeeded:
> $ grep 'write(1, "0' /tmp/axspawn-write-trace |grep  "= -1 EAGAIN ("|wc
>      76     760   24472
> $ 
> 
> 
> 
> The testfile was generated with generate-testfile.c :
> 
> #include <stdio.h>
> 
> unsigned long datasize = 2*1000*1000L;
> 
> char * payload="deadbeef";
> 
> int main()
> {
>   unsigned long num = 0;
> 
>   for (num = 0; (num *256) < datasize; num++) {
>     int i;
>     printf("%6.6ld ", num);
>     for (i = 0; i < 31; i++)
>       printf(payload);
>     printf("\n");
>   }
> }
> 
> 
> 
> vy 73,
>       - Thomas  dl9sau
> --
> To unsubscribe from this list: send the line "unsubscribe linux-hams" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-hams" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to