Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-16 Thread John Stultz
On Wed, May 11, 2016 at 3:00 PM, Dean Jenkins  wrote:
>
> Your observations are consistent with missing URBs from the USB host
> controller.
>
> Here is a summary of what I think is happening in your case:
>
> Good case:
> URB #1: 1514 octets of 1514 Ethernet frame (A)
> URB #2: 1514 octets of 1514 Ethernet frame (B) + 526 octets of 1514 Ethernet
> frame (C)
> URB #3: 988 octets of 1514 Ethernet frame (C)
> URB #4: 1514 octets of 1514 Ethernet frame (D)
>
> Therefore, Ethernet frame (C) is spanning URBs #2 and #3.
>
> Bad case, URB #3 is lost:
> URB #1: 1514 octets of 1514 Ethernet frame (A)
> URB #2: 1514 octets of 1514 Ethernet frame (B) + 526 octets of 1514 Ethernet
> frame (C)
> Remaining is 988
> URB #4: 1514 octets of 1514 Ethernet frame (D)
>
> But when URB #4 is analysed the 32-bit Header word is not found after 988
> octets in the URB buffer so "sync lost".
> The end of Ethernet frame (C) is missing so drop the Ethernet frame.
> Now look at the start of the URB #4 buffer and find a 32-bit header word so
> Ethernet frame (D) can be consumed.
>
> So I think the commit is acting as intended and you are suffering from lost
> URBs.

No. I went digging on this for a bit longer, and it looks like its
just that you're calculating the offset wrong in your check.

I was wondering why without your patch we wouldn't see "Bad Header
Length" messages, since if the remaining was 988 and the skb->len was
2048 as seen in my logs, without your patch we should copy the 988
bytes out clear remaining and then continue processing the rest of the
skb, which calculates the header and checks the size. If we really
lost the URB, we should throw an error at that point, since really
we'd be midway through the following frame.  But we just don't see
that with your patch removed.

Looking more closely, in the main loop, we do:
(where offset is zero, or set to "offset += (copy_length + 1) &
0xfffe" in the previous loop)
rx->header = get_unaligned_le32(skb->data +
offset);
offset += sizeof(u32);

But your check calculates:
offset = ((rx->remaining + 1) & 0xfffe) + sizeof(u32);
rx->header = get_unaligned_le32(skb->data + offset);

Adding some debug logic to check those offset calculation used to find
rx->header, the one in your code is always too large by sizeof(u32).

So removing the extra addition in your offset calculation seems to
solve this for me.

I'll send out a patch here shortly.

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-16 Thread John Stultz
On Wed, May 11, 2016 at 3:00 PM, Dean Jenkins  wrote:
>
> Your observations are consistent with missing URBs from the USB host
> controller.
>
> Here is a summary of what I think is happening in your case:
>
> Good case:
> URB #1: 1514 octets of 1514 Ethernet frame (A)
> URB #2: 1514 octets of 1514 Ethernet frame (B) + 526 octets of 1514 Ethernet
> frame (C)
> URB #3: 988 octets of 1514 Ethernet frame (C)
> URB #4: 1514 octets of 1514 Ethernet frame (D)
>
> Therefore, Ethernet frame (C) is spanning URBs #2 and #3.
>
> Bad case, URB #3 is lost:
> URB #1: 1514 octets of 1514 Ethernet frame (A)
> URB #2: 1514 octets of 1514 Ethernet frame (B) + 526 octets of 1514 Ethernet
> frame (C)
> Remaining is 988
> URB #4: 1514 octets of 1514 Ethernet frame (D)
>
> But when URB #4 is analysed the 32-bit Header word is not found after 988
> octets in the URB buffer so "sync lost".
> The end of Ethernet frame (C) is missing so drop the Ethernet frame.
> Now look at the start of the URB #4 buffer and find a 32-bit header word so
> Ethernet frame (D) can be consumed.
>
> So I think the commit is acting as intended and you are suffering from lost
> URBs.

No. I went digging on this for a bit longer, and it looks like its
just that you're calculating the offset wrong in your check.

I was wondering why without your patch we wouldn't see "Bad Header
Length" messages, since if the remaining was 988 and the skb->len was
2048 as seen in my logs, without your patch we should copy the 988
bytes out clear remaining and then continue processing the rest of the
skb, which calculates the header and checks the size. If we really
lost the URB, we should throw an error at that point, since really
we'd be midway through the following frame.  But we just don't see
that with your patch removed.

Looking more closely, in the main loop, we do:
(where offset is zero, or set to "offset += (copy_length + 1) &
0xfffe" in the previous loop)
rx->header = get_unaligned_le32(skb->data +
offset);
offset += sizeof(u32);

But your check calculates:
offset = ((rx->remaining + 1) & 0xfffe) + sizeof(u32);
rx->header = get_unaligned_le32(skb->data + offset);

Adding some debug logic to check those offset calculation used to find
rx->header, the one in your code is always too large by sizeof(u32).

So removing the extra addition in your offset calculation seems to
solve this for me.

I'll send out a patch here shortly.

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-11 Thread Dean Jenkins

Hi John,

I have purchased a "uGreen" USB Ethernet Adaptor which was reported as 
showing the issue:


lsusb shows:
ID 0b95:772b ASIX Electronics Corp. AX88772B

dmesg shows:
[119591.413298] usb 2-1: new high-speed USB device number 12 using ci_hdrc
[119591.576970] usb 2-1: New USB device found, idVendor=0b95, idProduct=772b
[119591.576994] usb 2-1: New USB device strings: Mfr=1, Product=2, 
SerialNumber=3

[119591.577010] usb 2-1: Product: AX88772C
[119591.577025] usb 2-1: Manufacturer: ASIX Elec. Corp.

Strangely the product string says "AX88772C" and lsusb shows "AX88772B"

I used our ARM (32-bit 2 core) board running our highly customised 3.14 
kernel and ran a ping test that slowly increments the ping payload size 
so forcing the Ethernet frames to slowly extend in length and eventually 
forcing IPv4 fragmentation to occur due to the MTU limit of 1500. In my 
test the ICMP ping payload lengths ranged from 1 to 5000.


During the test run I saw (only 3 errors):
[27455.113010] asix 2-1:1.0 eth0: asix_rx_fixup() Data Header 
synchronisation was lost, remaining 23
[27455.113037] asix 2-1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x77767574, offset 4
[27456.113269] asix 2-1:1.0 eth0: asix_rx_fixup() Data Header 
synchronisation was lost, remaining 27
[27456.113329] asix 2-1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x77767574, offset 4
[27457.113271] asix 2-1:1.0 eth0: asix_rx_fixup() Data Header 
synchronisation was lost, remaining 30
[27457.113328] asix 2-1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x77767574, offset 4


This meets my expectation of "sync lost" followed immediately by "Bad 
Header Length". A close look at the timestamps shows gaps of around 20us 
to 50us which suggests the code is processing the same URB eg. "sync 
lost" and "Bad Header Length" are written from the same instance of 
asix_rx_fixup_internal().


My example suggests that the previous URB went missing so data was lost 
causing a discontinuity in the data stream. This was the intended 
purpose of the commit to prevent bad Ethernet frames being sent up the 
IP stack when an URB went missing. A bad Ethernet frame would otherwise 
be created by having the start of an Ethernet frame appended with data 
from the current URB causing a corrupted Ethernet frame to be generated 
and sent up the IP stack.


Also the failure seems to be independent of the ping payload length but 
longer test periods of specific payload lengths would be needed to allow 
the 32 bit header word to move around relative to the start of the URB 
buffer.


In my example, the "Bad Header Length 0x77767574" is reading the ping 
payload data of 0x74, 0x75, 0x76, 077 which is located at the start of 
the URB buffer. The remaining values are low at 23 to 30 which suggests 
the end of the Ethernet frame was in the missing URB. The ICMP ping data 
of 0x74, 0x75, 0x76, 077 is from the next Ethernet frame meaning the end 
of the current Ethernet frame is missing and the next frame has a 
missing start of Ethernet frame.


Note that due to IPv4 fragmentation "consecutive" Ethernet frames will 
contain payloads of 1500 (MTU size) octets typically followed by a short 
Ethernet frame. The payloads are fragmented IP packets.




So I've been trying to add some print messages here to better
understand whats going on.

Again, I'm a bit new to this code, so forgive me for my lack of
understanding things. Since the remaining value seems to be key, I
tried to look around and figure out where it was being set. It seems
like its only set in this function, is that right?  So this made me
guess something might be happening in a previous iteration that was
causing this to trigger.

I added some debug prints to every time we set the remaining value, or
modify it, as well as to print the value if we enter the fixup
function with a non-zero remaining value.

When we set the remaining value, usually its to 1514, when the skblen is 1518.

1514 is Ethernet header length + payload at MTU size of 1500.
skblen of 1518 is 32-bit header word + maximum Ethernet frame length 
(for your Network).


However, right before we catch the problem, I see this:
I am guessing where your debug is located in the code so I may have 
ms-interpreted your information.



[   84.844337] JDB set remaining to 1514 (skblen: 1518)
This suggests 1 maximum length Ethernet frame of 1514 octets in the URB 
buffer.

[   84.844379] JDB set remaining to 1514 (skblen: 1518)
[   84.844429] JDB set remaining to 1514 (skblen: 1518)
[   84.844458] JDB set remaining to 1514 (skblen: 1518)
[   84.844483] JDB set remaining to 1514 (skblen: 1518)
[   84.844507] JDB set remaining to 1514 (skblen: 1518)
[   84.844559] JDB set remaining to 1514 (skblen: 2048)
This URB probably has 2 Ethernet frames; 1 complete frame plus the start 
of the next Ethernet frame.


I think 2048 could be the maximum URB transfer length for the USB bulk 
transfer. 2048 seems to be a low value so should be investigated.



[   84.844583] JDB set 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-11 Thread Dean Jenkins

Hi John,

I have purchased a "uGreen" USB Ethernet Adaptor which was reported as 
showing the issue:


lsusb shows:
ID 0b95:772b ASIX Electronics Corp. AX88772B

dmesg shows:
[119591.413298] usb 2-1: new high-speed USB device number 12 using ci_hdrc
[119591.576970] usb 2-1: New USB device found, idVendor=0b95, idProduct=772b
[119591.576994] usb 2-1: New USB device strings: Mfr=1, Product=2, 
SerialNumber=3

[119591.577010] usb 2-1: Product: AX88772C
[119591.577025] usb 2-1: Manufacturer: ASIX Elec. Corp.

Strangely the product string says "AX88772C" and lsusb shows "AX88772B"

I used our ARM (32-bit 2 core) board running our highly customised 3.14 
kernel and ran a ping test that slowly increments the ping payload size 
so forcing the Ethernet frames to slowly extend in length and eventually 
forcing IPv4 fragmentation to occur due to the MTU limit of 1500. In my 
test the ICMP ping payload lengths ranged from 1 to 5000.


During the test run I saw (only 3 errors):
[27455.113010] asix 2-1:1.0 eth0: asix_rx_fixup() Data Header 
synchronisation was lost, remaining 23
[27455.113037] asix 2-1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x77767574, offset 4
[27456.113269] asix 2-1:1.0 eth0: asix_rx_fixup() Data Header 
synchronisation was lost, remaining 27
[27456.113329] asix 2-1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x77767574, offset 4
[27457.113271] asix 2-1:1.0 eth0: asix_rx_fixup() Data Header 
synchronisation was lost, remaining 30
[27457.113328] asix 2-1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x77767574, offset 4


This meets my expectation of "sync lost" followed immediately by "Bad 
Header Length". A close look at the timestamps shows gaps of around 20us 
to 50us which suggests the code is processing the same URB eg. "sync 
lost" and "Bad Header Length" are written from the same instance of 
asix_rx_fixup_internal().


My example suggests that the previous URB went missing so data was lost 
causing a discontinuity in the data stream. This was the intended 
purpose of the commit to prevent bad Ethernet frames being sent up the 
IP stack when an URB went missing. A bad Ethernet frame would otherwise 
be created by having the start of an Ethernet frame appended with data 
from the current URB causing a corrupted Ethernet frame to be generated 
and sent up the IP stack.


Also the failure seems to be independent of the ping payload length but 
longer test periods of specific payload lengths would be needed to allow 
the 32 bit header word to move around relative to the start of the URB 
buffer.


In my example, the "Bad Header Length 0x77767574" is reading the ping 
payload data of 0x74, 0x75, 0x76, 077 which is located at the start of 
the URB buffer. The remaining values are low at 23 to 30 which suggests 
the end of the Ethernet frame was in the missing URB. The ICMP ping data 
of 0x74, 0x75, 0x76, 077 is from the next Ethernet frame meaning the end 
of the current Ethernet frame is missing and the next frame has a 
missing start of Ethernet frame.


Note that due to IPv4 fragmentation "consecutive" Ethernet frames will 
contain payloads of 1500 (MTU size) octets typically followed by a short 
Ethernet frame. The payloads are fragmented IP packets.




So I've been trying to add some print messages here to better
understand whats going on.

Again, I'm a bit new to this code, so forgive me for my lack of
understanding things. Since the remaining value seems to be key, I
tried to look around and figure out where it was being set. It seems
like its only set in this function, is that right?  So this made me
guess something might be happening in a previous iteration that was
causing this to trigger.

I added some debug prints to every time we set the remaining value, or
modify it, as well as to print the value if we enter the fixup
function with a non-zero remaining value.

When we set the remaining value, usually its to 1514, when the skblen is 1518.

1514 is Ethernet header length + payload at MTU size of 1500.
skblen of 1518 is 32-bit header word + maximum Ethernet frame length 
(for your Network).


However, right before we catch the problem, I see this:
I am guessing where your debug is located in the code so I may have 
ms-interpreted your information.



[   84.844337] JDB set remaining to 1514 (skblen: 1518)
This suggests 1 maximum length Ethernet frame of 1514 octets in the URB 
buffer.

[   84.844379] JDB set remaining to 1514 (skblen: 1518)
[   84.844429] JDB set remaining to 1514 (skblen: 1518)
[   84.844458] JDB set remaining to 1514 (skblen: 1518)
[   84.844483] JDB set remaining to 1514 (skblen: 1518)
[   84.844507] JDB set remaining to 1514 (skblen: 1518)
[   84.844559] JDB set remaining to 1514 (skblen: 2048)
This URB probably has 2 Ethernet frames; 1 complete frame plus the start 
of the next Ethernet frame.


I think 2048 could be the maximum URB transfer length for the USB bulk 
transfer. 2048 seems to be a low value so should be investigated.



[   84.844583] JDB set 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread John Stultz
On Tue, May 3, 2016 at 2:16 PM, Dean Jenkins  wrote:
>
> [  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
> synchronisation was lost, remaining 988
>
> This error message consistently shows the remaining value to be 988, at
> least for the 3 examples provided by John. This does not suggest a random
> failure unless there are other examples of a non 988 remaining value error
> message. 988 is well within a Ethernet frame length so seems to be valid.
>
> I think a good step would be to add some debug to print the rx->remaining
> value at entry to asix_rx_fixup_internal(). This would generate a lot of
> debug but a pattern of the values might emerge.


So I've been trying to add some print messages here to better
understand whats going on.

Again, I'm a bit new to this code, so forgive me for my lack of
understanding things. Since the remaining value seems to be key, I
tried to look around and figure out where it was being set. It seems
like its only set in this function, is that right?  So this made me
guess something might be happening in a previous iteration that was
causing this to trigger.

I added some debug prints to every time we set the remaining value, or
modify it, as well as to print the value if we enter the fixup
function with a non-zero remaining value.

When we set the remaining value, usually its to 1514, when the skblen is 1518.

However, right before we catch the problem, I see this:


[   84.844337] JDB set remaining to 1514 (skblen: 1518)
[   84.844379] JDB set remaining to 1514 (skblen: 1518)
[   84.844429] JDB set remaining to 1514 (skblen: 1518)
[   84.844458] JDB set remaining to 1514 (skblen: 1518)
[   84.844483] JDB set remaining to 1514 (skblen: 1518)
[   84.844507] JDB set remaining to 1514 (skblen: 1518)
[   84.844559] JDB set remaining to 1514 (skblen: 2048)
[   84.844583] JDB set remaining to 1514 (skblen: 2048)
[   84.844595] JDB: 1514 > 2048 - 1522
[   84.844606] JDB: dropping remaining by 526
[   84.844624] asix_rx_fixup_internal()  remaining: 988,  skb->len: 2048
[   84.844672] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[   84.844945] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0xdd5f8f9b, offset 4
[   84.845217] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x3ce1ad3c, offset 4
[   84.845451] JDB set remaining to 1514 (skblen: 1518)
[   84.845485] JDB set remaining to 1514 (skblen: 1518)
[   84.845511] JDB set remaining to 1514 (skblen: 1518)
[   84.851003] JDB set remaining to 1514 (skblen: 1518)


So when the issue happens, it seems to be due to an larger then usual
skb (2048). The first time through the wile loop we set the remaining
to 1514, but offset is fairly small, so we set copy_length to 1514,
and clear remaining. The offset is bumped by a little more then the
copy length and we loop again.  Then the second time through we set
remaining to 1514, but since offset is bigger now, the if
(rx->remaining > skb->len - offset)  case is true..

This is where it feels a little strange..

We calculate the copy_length as the difference between the offset and
the skb->len (so how much is left in the skb, which is 526), then
decrement remaining by that amount.  Not really sure what remaining
(now 988) is supposed to represent here. We copy the 526 bytes, and
then exit the loop.

Now the next time we are called, we enter and we have a remaining
value still of 988, which triggers the header synchronization error
path.

Now, I'm not sure if the remainder handling logic is sane, or if the
skb->len being 2048 is problematic, or what.  The skb->lens can vary
in sizes, usually 1518 during high throughput, but I've seen 1588,
1640, and other larger numbers that don't trigger the same problem.

ie:
[  106.946473] JDB set remaining to 1514 (skblen: 1518)
[  106.946525] JDB set remaining to 1514 (skblen: 1640)
[  106.946546] JDB set remaining to 118 (skblen: 1640)
[  106.946586] JDB set remaining to 1514 (skblen: 1518)


So yea.. maybe that will help clue things in a bit? I'm still a bit lost. :)

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread John Stultz
On Tue, May 3, 2016 at 2:16 PM, Dean Jenkins  wrote:
>
> [  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
> synchronisation was lost, remaining 988
>
> This error message consistently shows the remaining value to be 988, at
> least for the 3 examples provided by John. This does not suggest a random
> failure unless there are other examples of a non 988 remaining value error
> message. 988 is well within a Ethernet frame length so seems to be valid.
>
> I think a good step would be to add some debug to print the rx->remaining
> value at entry to asix_rx_fixup_internal(). This would generate a lot of
> debug but a pattern of the values might emerge.


So I've been trying to add some print messages here to better
understand whats going on.

Again, I'm a bit new to this code, so forgive me for my lack of
understanding things. Since the remaining value seems to be key, I
tried to look around and figure out where it was being set. It seems
like its only set in this function, is that right?  So this made me
guess something might be happening in a previous iteration that was
causing this to trigger.

I added some debug prints to every time we set the remaining value, or
modify it, as well as to print the value if we enter the fixup
function with a non-zero remaining value.

When we set the remaining value, usually its to 1514, when the skblen is 1518.

However, right before we catch the problem, I see this:


[   84.844337] JDB set remaining to 1514 (skblen: 1518)
[   84.844379] JDB set remaining to 1514 (skblen: 1518)
[   84.844429] JDB set remaining to 1514 (skblen: 1518)
[   84.844458] JDB set remaining to 1514 (skblen: 1518)
[   84.844483] JDB set remaining to 1514 (skblen: 1518)
[   84.844507] JDB set remaining to 1514 (skblen: 1518)
[   84.844559] JDB set remaining to 1514 (skblen: 2048)
[   84.844583] JDB set remaining to 1514 (skblen: 2048)
[   84.844595] JDB: 1514 > 2048 - 1522
[   84.844606] JDB: dropping remaining by 526
[   84.844624] asix_rx_fixup_internal()  remaining: 988,  skb->len: 2048
[   84.844672] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[   84.844945] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0xdd5f8f9b, offset 4
[   84.845217] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x3ce1ad3c, offset 4
[   84.845451] JDB set remaining to 1514 (skblen: 1518)
[   84.845485] JDB set remaining to 1514 (skblen: 1518)
[   84.845511] JDB set remaining to 1514 (skblen: 1518)
[   84.851003] JDB set remaining to 1514 (skblen: 1518)


So when the issue happens, it seems to be due to an larger then usual
skb (2048). The first time through the wile loop we set the remaining
to 1514, but offset is fairly small, so we set copy_length to 1514,
and clear remaining. The offset is bumped by a little more then the
copy length and we loop again.  Then the second time through we set
remaining to 1514, but since offset is bigger now, the if
(rx->remaining > skb->len - offset)  case is true..

This is where it feels a little strange..

We calculate the copy_length as the difference between the offset and
the skb->len (so how much is left in the skb, which is 526), then
decrement remaining by that amount.  Not really sure what remaining
(now 988) is supposed to represent here. We copy the 526 bytes, and
then exit the loop.

Now the next time we are called, we enter and we have a remaining
value still of 988, which triggers the header synchronization error
path.

Now, I'm not sure if the remainder handling logic is sane, or if the
skb->len being 2048 is problematic, or what.  The skb->lens can vary
in sizes, usually 1518 during high throughput, but I've seen 1588,
1640, and other larger numbers that don't trigger the same problem.

ie:
[  106.946473] JDB set remaining to 1514 (skblen: 1518)
[  106.946525] JDB set remaining to 1514 (skblen: 1640)
[  106.946546] JDB set remaining to 118 (skblen: 1640)
[  106.946586] JDB set remaining to 1514 (skblen: 1518)


So yea.. maybe that will help clue things in a bit? I'm still a bit lost. :)

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread David B. Robins

On 2016-05-04 03:58, Dean Jenkins wrote:

On 04/05/16 01:28, David B. Robins wrote:




Here is the code snippet from the patch with my annotations between #
#, I will try to explain my intentions. Feel free to point out any
flaws:

if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) {
# Only runs when rx->remaining !=0 and the end of the 
Ethernet

frame + next 32-bit header word is within the URB buffer. #
# Therefore, this code does not run when the end of an
Ethernet frame has been reached in the previous URB #
# or when the end of the Ethernet frame + next 32-bit header
word will be in a later URB buffer #


It may well be. I don't have the setup with me now, but I can try 
tomorrow to reproduce an environment where I can add some more 
detailed logging.


Since the URB length has to be >= than the remaining data plus a u32, 
the devices that John Stultz and I are using (AX88772B in my case) may 
be adding some additional data/padding after an Ethernet frame, 
expecting it to be discarded, and running into this check and its 
consequences. This may mean the device is badly behaved, if it is 
specified not to send anything extra; in any case, a well-intentioned 
error correction has gone badly, but I better understand the intent 
now. I am curious to know how often the device you are using benefits 
from this block of code.


The issue is that the driver should be robust to cope with missing
URBs. Whilst testing with D-Link DUB-E100 C1 AX88772 USB to Ethernet
adaptor in our ARM embedded system which runs in hostile environments,
it was noticed that URBs could be lost (probably due to a bug
elsewhere or low memory issue). Without this patch, a missing URB
causes bad Ethernet frames to be passed up to the IP stack because
rx->remaining spans multiple URBs.

In the good case of an Ethernet frame spanning 2 URBs, the 1st URB is
processed and copies the 1st part of the Ethernet frame into the
netdev buffer, for the 2nd URB the remaining part of the Ethernet
frame is copied into the same netdev buffer to complete the Ethernet
frame. The netdev buffer is then sent up to the IP stack.

In the case of a missing URB, a bad Ethernet frame is created as 
follows:

The 1st URB is processed and copies the 1st part of the Ethernet frame
into the netdev buffer, the 2nd URB is lost (somehow),  the 3rd URB is
processed and blindly copies what it thinks is the remaining part of
the Ethernet frame in the same netdev buffer which corrupts the
Ethernet frame. The netdev buffer is then sent up to the IP stack. The
3rd URB and subsequent URBs are processed but synchronisation has been
lost so can misread data as a 32-bit header word. It is likely that
some good Ethernet frames get discarded whilst trying to
resynchronise.

A recovery strategy for regaining lock with the 32-bit header word is
necessary otherwise the driver will have difficulty in recovering from
a lost URB.

In the "olden days", the 32-bit header word was always at the start of
the URB buffer so previous URBs did not influence the current URB. So
no recovery strategy was needed at that time. But now we have to
remember what happened in the previous URB and a lost URB can cause a
discontinuity in the data stream because the data is not always
aligned to the start of the URB buffer.

I agree that your environment may never suffer from lost URBs so
removal of the patch would work OK.

I will try to find some time to setup a test environment.


I got the chance to add some more logging (and add back the header 
synchronization code) to a system which is receiving video data on an 
8-port POE device using AX88772Bs - it replaces the "Data header 
synchronization was lost" log and logs a few more local items ("u32" = 
rx->header):


[3105478.073908] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3105478.073918] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0xf05b7c5f, offset 4
[3105478.074022] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() skb->len = 
2048
[3105478.074032] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0xffaae2a5, offset 4
[3105478.074146] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() skb->len = 
1518
[3105478.074280] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() skb->len = 
2048
[3105478.074291] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0xe1478400, offset 4

[3105478.074399] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() skb->len = 916
[3105478.074409] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0x9fa84eb5, offset 4
[3105478.074657] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() skb->len = 
1518


Grepping for just the SYNC_LOSS messages shows a similar pattern to that 
seen elsewhere in the thread:


[3104761.056033] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.058035] asix 1-1.4.1.3:1.0 poe2: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread David B. Robins

On 2016-05-04 03:58, Dean Jenkins wrote:

On 04/05/16 01:28, David B. Robins wrote:




Here is the code snippet from the patch with my annotations between #
#, I will try to explain my intentions. Feel free to point out any
flaws:

if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) {
# Only runs when rx->remaining !=0 and the end of the 
Ethernet

frame + next 32-bit header word is within the URB buffer. #
# Therefore, this code does not run when the end of an
Ethernet frame has been reached in the previous URB #
# or when the end of the Ethernet frame + next 32-bit header
word will be in a later URB buffer #


It may well be. I don't have the setup with me now, but I can try 
tomorrow to reproduce an environment where I can add some more 
detailed logging.


Since the URB length has to be >= than the remaining data plus a u32, 
the devices that John Stultz and I are using (AX88772B in my case) may 
be adding some additional data/padding after an Ethernet frame, 
expecting it to be discarded, and running into this check and its 
consequences. This may mean the device is badly behaved, if it is 
specified not to send anything extra; in any case, a well-intentioned 
error correction has gone badly, but I better understand the intent 
now. I am curious to know how often the device you are using benefits 
from this block of code.


The issue is that the driver should be robust to cope with missing
URBs. Whilst testing with D-Link DUB-E100 C1 AX88772 USB to Ethernet
adaptor in our ARM embedded system which runs in hostile environments,
it was noticed that URBs could be lost (probably due to a bug
elsewhere or low memory issue). Without this patch, a missing URB
causes bad Ethernet frames to be passed up to the IP stack because
rx->remaining spans multiple URBs.

In the good case of an Ethernet frame spanning 2 URBs, the 1st URB is
processed and copies the 1st part of the Ethernet frame into the
netdev buffer, for the 2nd URB the remaining part of the Ethernet
frame is copied into the same netdev buffer to complete the Ethernet
frame. The netdev buffer is then sent up to the IP stack.

In the case of a missing URB, a bad Ethernet frame is created as 
follows:

The 1st URB is processed and copies the 1st part of the Ethernet frame
into the netdev buffer, the 2nd URB is lost (somehow),  the 3rd URB is
processed and blindly copies what it thinks is the remaining part of
the Ethernet frame in the same netdev buffer which corrupts the
Ethernet frame. The netdev buffer is then sent up to the IP stack. The
3rd URB and subsequent URBs are processed but synchronisation has been
lost so can misread data as a 32-bit header word. It is likely that
some good Ethernet frames get discarded whilst trying to
resynchronise.

A recovery strategy for regaining lock with the 32-bit header word is
necessary otherwise the driver will have difficulty in recovering from
a lost URB.

In the "olden days", the 32-bit header word was always at the start of
the URB buffer so previous URBs did not influence the current URB. So
no recovery strategy was needed at that time. But now we have to
remember what happened in the previous URB and a lost URB can cause a
discontinuity in the data stream because the data is not always
aligned to the start of the URB buffer.

I agree that your environment may never suffer from lost URBs so
removal of the patch would work OK.

I will try to find some time to setup a test environment.


I got the chance to add some more logging (and add back the header 
synchronization code) to a system which is receiving video data on an 
8-port POE device using AX88772Bs - it replaces the "Data header 
synchronization was lost" log and logs a few more local items ("u32" = 
rx->header):


[3105478.073908] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3105478.073918] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0xf05b7c5f, offset 4
[3105478.074022] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() skb->len = 
2048
[3105478.074032] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0xffaae2a5, offset 4
[3105478.074146] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() skb->len = 
1518
[3105478.074280] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() skb->len = 
2048
[3105478.074291] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0xe1478400, offset 4

[3105478.074399] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() skb->len = 916
[3105478.074409] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0x9fa84eb5, offset 4
[3105478.074657] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() skb->len = 
1518


Grepping for just the SYNC_LOSS messages shows a similar pattern to that 
seen elsewhere in the thread:


[3104761.056033] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.058035] asix 1-1.4.1.3:1.0 poe2: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread John Stultz
On Tue, May 3, 2016 at 2:16 PM, Dean Jenkins  wrote:
> A good test would be to run "ping -c 1 -s $packet_length $ip_address" inside
> a script which has a loop with an increasing payload length $packet_length
> with a small delay between ping calls. This will show whether particular
> packet sizes trigger the failures.
>
> Then try with "ping -f -c 200 -s $packet_length $ip_address" to load up the
> USB link.

I've tried both of these on my x86_64 system.  I can send single pings
up to 65507 without triggering the issue (after which I get errors
sending on the host side as I think I cross a 64k boundary with
headers, not the asix errors).

Then when I try ping -f -c 200 -s 65507 $ip_address, I don't see any
failures. I did it for a count of 2000 as well without any issues.

I'll be adding more debug prints in soon.

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread John Stultz
On Tue, May 3, 2016 at 2:16 PM, Dean Jenkins  wrote:
> A good test would be to run "ping -c 1 -s $packet_length $ip_address" inside
> a script which has a loop with an increasing payload length $packet_length
> with a small delay between ping calls. This will show whether particular
> packet sizes trigger the failures.
>
> Then try with "ping -f -c 200 -s $packet_length $ip_address" to load up the
> USB link.

I've tried both of these on my x86_64 system.  I can send single pings
up to 65507 without triggering the issue (after which I get errors
sending on the host side as I think I cross a 64k boundary with
headers, not the asix errors).

Then when I try ping -f -c 200 -s 65507 $ip_address, I don't see any
failures. I did it for a count of 2000 as well without any issues.

I'll be adding more debug prints in soon.

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread John Stultz
On Fri, May 6, 2016 at 8:00 AM, Dean Jenkins  wrote:
> My conclusion is that your USB to Ethernet Adaptor is not running at high
> speed (480Mbps) mode which is causing a partial loss (corruption) of
> Ethernet frames across the USB link. A USB Protocol Analyser or software
> tool usbmon could be used to confirm this scenario.
>
> Therefore please retest with a working high-speed USB hub or remove the
> full-speed USB hub from the test environment and directly connect the USB to
> Ethernet Adaptor to the root hub of the USB port. Then repeat the tests to
> see whether anything improved.
>
> In other words, you need to eliminate the dmesg messages saying "not running
> at top speed; connect to a high speed hub".

The aarch64 system has a quirk that at the moment limits it to the
slower full-speed mode, which also exacerbates the issue (basically
taking a fairly slow 1.1.Mb/s network connection without your patch,
to an almost unusable 30Kb/s with it).

But that isn't the case on the x86_64 system, which is seeing a very
similar problem (though the performance effect isn't nearly as bad, as
the error rate in time seems relatively similar on both, and I think
my scp transmissions are cpu bound on this atom board :).

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread John Stultz
On Fri, May 6, 2016 at 8:00 AM, Dean Jenkins  wrote:
> My conclusion is that your USB to Ethernet Adaptor is not running at high
> speed (480Mbps) mode which is causing a partial loss (corruption) of
> Ethernet frames across the USB link. A USB Protocol Analyser or software
> tool usbmon could be used to confirm this scenario.
>
> Therefore please retest with a working high-speed USB hub or remove the
> full-speed USB hub from the test environment and directly connect the USB to
> Ethernet Adaptor to the root hub of the USB port. Then repeat the tests to
> see whether anything improved.
>
> In other words, you need to eliminate the dmesg messages saying "not running
> at top speed; connect to a high speed hub".

The aarch64 system has a quirk that at the moment limits it to the
slower full-speed mode, which also exacerbates the issue (basically
taking a fairly slow 1.1.Mb/s network connection without your patch,
to an almost unusable 30Kb/s with it).

But that isn't the case on the x86_64 system, which is seeing a very
similar problem (though the performance effect isn't nearly as bad, as
the error rate in time seems relatively similar on both, and I think
my scp transmissions are cpu bound on this atom board :).

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread John Stultz
On Thu, May 5, 2016 at 1:11 AM, Dean Jenkins  wrote:
> On 05/05/16 00:45, John Stultz wrote:
>>
>> Just as a sample point, I have managed to reproduce exactly this issue
>> on an x86_64 system by simply scp'ing a large file.
>
> Please tell us the x86_64 kernel version number that you used and which
> Linux Distribution it was ? This allows other people a chance to reproduce
> your observations.

Sorry for being a little slow here, had some other issues I had to chase.

On my x86_64 system, its Ubuntu 14.04.4, with a 4.6.0-rc2 kernel.


>> [  417.819276] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
>> synchronisation was lost, remaining 988
>
> It is interesting that the reported "remaining" value is 988. Is 988 always
> shown ? I mean that do you see any other "remaining" values for the "Data
> Header synchronisation was lost" error message ?

Yep. Its always the same 988 remaining, on either architecture.


>> [  417.823415] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0xef830347, offset 4
>
> The gap in the timestamps shows 417.823415 - 417.819276 = 0.004139 = 4ms
> which is a large gap in terms of USB 2.0 high speed communications. This gap
> is expected to be in the 100us range for consecutive URBs. So 4ms is
> strange.
>
> The expectation is that the "Data Header synchronisation was lost" error
> message resets the 32-bit header word synchronisation to the start of the
> next URB buffer. The "Bad Header Length, offset 4" is the expected outcome
> for the next URB because it is unlikely the 32-bit header word is at the
> start of URB buffer due to Ethernet frames spanning across URBs.
>>
>> [  417.827502] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0x31e2b348, offset 4
>
> Timestamps show the gap to be 4ms which is strange for USB 2.0 high speed,
> are you sure high speed mode is being used ?
>>

Yep, on my x86_64 system, it seems to be.

[3.101115] usb 1-5: new high-speed USB device number 2 using ehci-pci
[3.232309] usb 1-5: New USB device found, idVendor=0b95, idProduct=772b
[3.232327] usb 1-5: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[3.232339] usb 1-5: Product: AX88772B
[3.232350] usb 1-5: Manufacturer: ASIX Elec. Corp.
[3.232360] usb 1-5: SerialNumber: 188298
[4.032206] asix 1-5:1.0 eth1: register 'asix' at
usb-:00:04.1-5, ASIX AX88772B USB 2.0 Ethernet, 00:50:b6:18:82:98


> Please can you supply the output of ifconfig for the USB to Ethernet
> adaptor, your example above shows eth1 as the device.
>
> Please show the output of ifconfig eth1 before and after the issue is seen.
> This will show us whether the kernel logs any network errors and how many
> bytes have been transferred.

Before:
$ ifconfig eth1
eth1  Link encap:Ethernet  HWaddr 00:50:b6:18:82:98
  inet addr:192.168.0.12  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: 2601:1c2:1002:83f0:250:b6ff:fe18:8298/64 Scope:Global
  inet6 addr: fe80::250:b6ff:fe18:8298/64 Scope:Link
  inet6 addr: 2601:1c2:1002:83f0:b0f0:71a0:6c7e:346b/64 Scope:Global
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:372 errors:0 dropped:0 overruns:0 frame:0
  TX packets:385 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:38523 (38.5 KB)  TX bytes:48801 (48.8 KB)


After:
$ ifconfig eth1
eth1  Link encap:Ethernet  HWaddr 00:50:b6:18:82:98
  inet addr:192.168.0.12  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: 2601:1c2:1002:83f0:250:b6ff:fe18:8298/64 Scope:Global
  inet6 addr: fe80::250:b6ff:fe18:8298/64 Scope:Link
  inet6 addr: 2601:1c2:1002:83f0:b0f0:71a0:6c7e:346b/64 Scope:Global
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:151005 errors:169 dropped:0 overruns:0 frame:0
  TX packets:61351 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:225874384 (225.8 MB)  TX bytes:4431098 (4.4 MB)




> After the issue is seen, please can you show us the output of "dmesg | grep
> asix" so that we can see status messages from the ASIX driver that the USB
> to Ethernet adaptor is using. In particular we need to check that USB high
> speed operation (480Mbps) is being used and not full speed operation
> (12Mbps).


[2.766525] usbcore: registered new interface driver asix
[4.031443] asix 1-5:1.0 eth1: register 'asix' at
usb-:00:04.1-5, ASIX AX88772B USB 2.0 Ethernet, 00:50:b6:18:82:98
[   31.578983] asix 1-5:1.0 eth1: link down
[   33.244743] asix 1-5:1.0 eth1: link up, 100Mbps, full-duplex, lpa 0xCDE1
[  171.959244] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  171.959530] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x1651c2bf, offset 4
[  171.959768] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0xfcf61092, offset 4
[  171.960001] asix 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread John Stultz
On Thu, May 5, 2016 at 1:11 AM, Dean Jenkins  wrote:
> On 05/05/16 00:45, John Stultz wrote:
>>
>> Just as a sample point, I have managed to reproduce exactly this issue
>> on an x86_64 system by simply scp'ing a large file.
>
> Please tell us the x86_64 kernel version number that you used and which
> Linux Distribution it was ? This allows other people a chance to reproduce
> your observations.

Sorry for being a little slow here, had some other issues I had to chase.

On my x86_64 system, its Ubuntu 14.04.4, with a 4.6.0-rc2 kernel.


>> [  417.819276] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
>> synchronisation was lost, remaining 988
>
> It is interesting that the reported "remaining" value is 988. Is 988 always
> shown ? I mean that do you see any other "remaining" values for the "Data
> Header synchronisation was lost" error message ?

Yep. Its always the same 988 remaining, on either architecture.


>> [  417.823415] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0xef830347, offset 4
>
> The gap in the timestamps shows 417.823415 - 417.819276 = 0.004139 = 4ms
> which is a large gap in terms of USB 2.0 high speed communications. This gap
> is expected to be in the 100us range for consecutive URBs. So 4ms is
> strange.
>
> The expectation is that the "Data Header synchronisation was lost" error
> message resets the 32-bit header word synchronisation to the start of the
> next URB buffer. The "Bad Header Length, offset 4" is the expected outcome
> for the next URB because it is unlikely the 32-bit header word is at the
> start of URB buffer due to Ethernet frames spanning across URBs.
>>
>> [  417.827502] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0x31e2b348, offset 4
>
> Timestamps show the gap to be 4ms which is strange for USB 2.0 high speed,
> are you sure high speed mode is being used ?
>>

Yep, on my x86_64 system, it seems to be.

[3.101115] usb 1-5: new high-speed USB device number 2 using ehci-pci
[3.232309] usb 1-5: New USB device found, idVendor=0b95, idProduct=772b
[3.232327] usb 1-5: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[3.232339] usb 1-5: Product: AX88772B
[3.232350] usb 1-5: Manufacturer: ASIX Elec. Corp.
[3.232360] usb 1-5: SerialNumber: 188298
[4.032206] asix 1-5:1.0 eth1: register 'asix' at
usb-:00:04.1-5, ASIX AX88772B USB 2.0 Ethernet, 00:50:b6:18:82:98


> Please can you supply the output of ifconfig for the USB to Ethernet
> adaptor, your example above shows eth1 as the device.
>
> Please show the output of ifconfig eth1 before and after the issue is seen.
> This will show us whether the kernel logs any network errors and how many
> bytes have been transferred.

Before:
$ ifconfig eth1
eth1  Link encap:Ethernet  HWaddr 00:50:b6:18:82:98
  inet addr:192.168.0.12  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: 2601:1c2:1002:83f0:250:b6ff:fe18:8298/64 Scope:Global
  inet6 addr: fe80::250:b6ff:fe18:8298/64 Scope:Link
  inet6 addr: 2601:1c2:1002:83f0:b0f0:71a0:6c7e:346b/64 Scope:Global
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:372 errors:0 dropped:0 overruns:0 frame:0
  TX packets:385 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:38523 (38.5 KB)  TX bytes:48801 (48.8 KB)


After:
$ ifconfig eth1
eth1  Link encap:Ethernet  HWaddr 00:50:b6:18:82:98
  inet addr:192.168.0.12  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: 2601:1c2:1002:83f0:250:b6ff:fe18:8298/64 Scope:Global
  inet6 addr: fe80::250:b6ff:fe18:8298/64 Scope:Link
  inet6 addr: 2601:1c2:1002:83f0:b0f0:71a0:6c7e:346b/64 Scope:Global
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:151005 errors:169 dropped:0 overruns:0 frame:0
  TX packets:61351 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:225874384 (225.8 MB)  TX bytes:4431098 (4.4 MB)




> After the issue is seen, please can you show us the output of "dmesg | grep
> asix" so that we can see status messages from the ASIX driver that the USB
> to Ethernet adaptor is using. In particular we need to check that USB high
> speed operation (480Mbps) is being used and not full speed operation
> (12Mbps).


[2.766525] usbcore: registered new interface driver asix
[4.031443] asix 1-5:1.0 eth1: register 'asix' at
usb-:00:04.1-5, ASIX AX88772B USB 2.0 Ethernet, 00:50:b6:18:82:98
[   31.578983] asix 1-5:1.0 eth1: link down
[   33.244743] asix 1-5:1.0 eth1: link up, 100Mbps, full-duplex, lpa 0xCDE1
[  171.959244] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  171.959530] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x1651c2bf, offset 4
[  171.959768] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0xfcf61092, offset 4
[  171.960001] asix 1-5:1.0 eth1: asix_rx_fixup() 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread Dean Jenkins

On 06/05/16 16:27, Andrew Lunn wrote:

In other words, the full-speed hub is restricting the USB to
Ethernet Adaptor to a 12Mbps (half-duplex) bandwidth to support
Ethernet 100Mbps (full-duplex) traffic. That is not going to work
very well because Ethernet frames (perhaps partial Ethernet frames)
need to be discarded within the USB link.

If that really is true, the design is broken. I would expect the
adaptor to reliably transfer whole frames over USB, and drop whole
frames from its receive queue when the USB is congested. TCP is also
going to see the USB bottleneck as just like any bottleneck in the
network and back off. So TCP streams should not cause major congestion
on the USB link.
The host's USB host controller polls the USB to Ethernet adaptor for 
more data. The USB to Ethernet adaptor cannot predict when the next poll 
request comes. The AX88772B can span Ethernet frames across multiple 
poll requests. This means it is possible get a partial Ethernet frame 
received in the USB host controller on one poll and it is assumed that 
the next poll (sometime in the near future) will get the remaining part 
of the Ethernet frame.


However, the USB to Ethernet adaptor does not contain an infinitely 
sized RX Ethernet buffer for the incoming Ethernet frames. I believe the 
USB to Ethernet adaptor is just a pipe and does not directly implement 
flow control for Ethernet frames so the RX buffer is going to overflow 
causing loss of whole Ethernet frames. I suspect the IP stack in the 
host computer implements flow control for Ethernet frames.


Because the AX88772B can span Ethernet frames across multiple poll 
requests there is a risk that the designers of the device could of 
implemented a solution to discard the remaining part of the Ethernet 
frame before the next poll arrives due to the RX buffer overflowing. I 
don't know the algorithm used in the AX88772B but there will be loss of 
data due to the mismatch in bandwidths. I agree that dropping whole 
Ethernet frames would be preferable to dropping partial Ethernet frames 
which would corrupt the data stream.


My suspicion is that the URB buffers are containing discontinues in the 
data stream because of lost data due to insufficient bandwidth on the 
USB link.



Going over a 12Mbps USB link should be no different
to hitting an old Ethernet hub which can only do 10/Half.
Not exactly, because USB is a transport link which is agnostic to the 
type of data that is flowing. It is up to the layers above USB to manage 
the data content.


In other words, the USB speed needs to be higher than the Ethernet speed 
to avoid mismatches in bandwidth.

Therefore please retest with a working high-speed USB hub or remove
the full-speed USB hub from the test environment and directly
connect the USB to Ethernet Adaptor to the root hub of the USB port.
Then repeat the tests to see whether anything improved.

In other words, you need to eliminate the dmesg messages saying "not
running at top speed; connect to a high speed hub".

I would also suggest testing with the Ethernet at 10/half. You should
be able to use Ethtool to set that up. Your USB and Ethernet bandwidth
become more equal. If you still see errors, it suggests a protocol
implementation error somewhere.
I agree with the suggestion but I hope USB high speed (480Mbps) 
operation was the intended environment rather than the useless USB full 
speed (12Mbps) operation.


Let's hope that not using the USB hub improves things.

Regards,
Dean


 Andrew


--
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.



Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread Dean Jenkins

On 06/05/16 16:27, Andrew Lunn wrote:

In other words, the full-speed hub is restricting the USB to
Ethernet Adaptor to a 12Mbps (half-duplex) bandwidth to support
Ethernet 100Mbps (full-duplex) traffic. That is not going to work
very well because Ethernet frames (perhaps partial Ethernet frames)
need to be discarded within the USB link.

If that really is true, the design is broken. I would expect the
adaptor to reliably transfer whole frames over USB, and drop whole
frames from its receive queue when the USB is congested. TCP is also
going to see the USB bottleneck as just like any bottleneck in the
network and back off. So TCP streams should not cause major congestion
on the USB link.
The host's USB host controller polls the USB to Ethernet adaptor for 
more data. The USB to Ethernet adaptor cannot predict when the next poll 
request comes. The AX88772B can span Ethernet frames across multiple 
poll requests. This means it is possible get a partial Ethernet frame 
received in the USB host controller on one poll and it is assumed that 
the next poll (sometime in the near future) will get the remaining part 
of the Ethernet frame.


However, the USB to Ethernet adaptor does not contain an infinitely 
sized RX Ethernet buffer for the incoming Ethernet frames. I believe the 
USB to Ethernet adaptor is just a pipe and does not directly implement 
flow control for Ethernet frames so the RX buffer is going to overflow 
causing loss of whole Ethernet frames. I suspect the IP stack in the 
host computer implements flow control for Ethernet frames.


Because the AX88772B can span Ethernet frames across multiple poll 
requests there is a risk that the designers of the device could of 
implemented a solution to discard the remaining part of the Ethernet 
frame before the next poll arrives due to the RX buffer overflowing. I 
don't know the algorithm used in the AX88772B but there will be loss of 
data due to the mismatch in bandwidths. I agree that dropping whole 
Ethernet frames would be preferable to dropping partial Ethernet frames 
which would corrupt the data stream.


My suspicion is that the URB buffers are containing discontinues in the 
data stream because of lost data due to insufficient bandwidth on the 
USB link.



Going over a 12Mbps USB link should be no different
to hitting an old Ethernet hub which can only do 10/Half.
Not exactly, because USB is a transport link which is agnostic to the 
type of data that is flowing. It is up to the layers above USB to manage 
the data content.


In other words, the USB speed needs to be higher than the Ethernet speed 
to avoid mismatches in bandwidth.

Therefore please retest with a working high-speed USB hub or remove
the full-speed USB hub from the test environment and directly
connect the USB to Ethernet Adaptor to the root hub of the USB port.
Then repeat the tests to see whether anything improved.

In other words, you need to eliminate the dmesg messages saying "not
running at top speed; connect to a high speed hub".

I would also suggest testing with the Ethernet at 10/half. You should
be able to use Ethtool to set that up. Your USB and Ethernet bandwidth
become more equal. If you still see errors, it suggests a protocol
implementation error somewhere.
I agree with the suggestion but I hope USB high speed (480Mbps) 
operation was the intended environment rather than the useless USB full 
speed (12Mbps) operation.


Let's hope that not using the USB hub improves things.

Regards,
Dean


 Andrew


--
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.



Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread Andrew Lunn
> In other words, the full-speed hub is restricting the USB to
> Ethernet Adaptor to a 12Mbps (half-duplex) bandwidth to support
> Ethernet 100Mbps (full-duplex) traffic. That is not going to work
> very well because Ethernet frames (perhaps partial Ethernet frames)
> need to be discarded within the USB link.

If that really is true, the design is broken. I would expect the
adaptor to reliably transfer whole frames over USB, and drop whole
frames from its receive queue when the USB is congested. TCP is also
going to see the USB bottleneck as just like any bottleneck in the
network and back off. So TCP streams should not cause major congestion
on the USB link. Going over a 12Mbps USB link should be no different
to hitting an old Ethernet hub which can only do 10/Half.

> Therefore please retest with a working high-speed USB hub or remove
> the full-speed USB hub from the test environment and directly
> connect the USB to Ethernet Adaptor to the root hub of the USB port.
> Then repeat the tests to see whether anything improved.
> 
> In other words, you need to eliminate the dmesg messages saying "not
> running at top speed; connect to a high speed hub".

I would also suggest testing with the Ethernet at 10/half. You should
be able to use Ethtool to set that up. Your USB and Ethernet bandwidth
become more equal. If you still see errors, it suggests a protocol
implementation error somewhere.

 Andrew


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread Andrew Lunn
> In other words, the full-speed hub is restricting the USB to
> Ethernet Adaptor to a 12Mbps (half-duplex) bandwidth to support
> Ethernet 100Mbps (full-duplex) traffic. That is not going to work
> very well because Ethernet frames (perhaps partial Ethernet frames)
> need to be discarded within the USB link.

If that really is true, the design is broken. I would expect the
adaptor to reliably transfer whole frames over USB, and drop whole
frames from its receive queue when the USB is congested. TCP is also
going to see the USB bottleneck as just like any bottleneck in the
network and back off. So TCP streams should not cause major congestion
on the USB link. Going over a 12Mbps USB link should be no different
to hitting an old Ethernet hub which can only do 10/Half.

> Therefore please retest with a working high-speed USB hub or remove
> the full-speed USB hub from the test environment and directly
> connect the USB to Ethernet Adaptor to the root hub of the USB port.
> Then repeat the tests to see whether anything improved.
> 
> In other words, you need to eliminate the dmesg messages saying "not
> running at top speed; connect to a high speed hub".

I would also suggest testing with the Ethernet at 10/half. You should
be able to use Ethtool to set that up. Your USB and Ethernet bandwidth
become more equal. If you still see errors, it suggests a protocol
implementation error somewhere.

 Andrew


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread Dean Jenkins

On 05/05/16 13:19, Guodong Xu wrote:

Hi, Dean

I am not sure why do you insist 'not full speed'. Actually, the tests
I run on ARM-64bit is at USB full speed mode. I pasted my log here:
http://paste.ubuntu.com/16236442/
, which includes the information you requested above, ifconfig, dmesg.
The interval between two consecutive errors varies from 10 to 40ms.



Your log from http://paste.ubuntu.com/16236442/ shows high speed for 
device 3 is not being used:


[3.586968] usb 1-1: new full-speed USB device number 2 using dwc2
[3.792091] usb 1-1: not running at top speed; connect to a high 
speed hub

[3.800477] hub 1-1:1.0: USB hub found
[3.803658] hub 1-1:1.0: 3 ports detected
[4.086636] usb 1-1.2: new full-speed USB device number 3 using dwc2
[4.202209] usb 1-1.2: not running at top speed; connect to a high 
speed hub
[8.851236] asix 1-1.2:1.0 eth0: register 'asix' at 
usb-f72c.usb-1.2, ASIX AX88772B USB 2.0 Ethernet, 00:0e:c6:fa:bf:fd


Hopefully, you know USB 2.0 high speed (480Mbps) is faster than full 
speed (12Mbps) mode.


Therefore, your USB to Ethernet Adaptor is not running in its optimal 
"normal" high speed operation and there is a USB hub in the way that is 
not running at USB high speed mode. This is an abnormal configuration 
and potentially explains some of your failure observations.


Running at full-speed (12Mbps) mode would explain why the timestamps has 
gaps of ms rather than us gaps (for 480Mbps).


In other words, the full-speed hub is restricting the USB to Ethernet 
Adaptor to a 12Mbps (half-duplex) bandwidth to support Ethernet 100Mbps 
(full-duplex) traffic. That is not going to work very well because 
Ethernet frames (perhaps partial Ethernet frames) need to be discarded 
within the USB link.


Your ifconfig output from http://paste.ubuntu.com/16236442/ shows 249 errors

eth0  Link encap:Ethernet  HWaddr 00:0e:c6:fa:bf:fd
  inet addr:192.168.1.11  Bcast:192.168.1.255 Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:865 errors:249 dropped:0 overruns:0 frame:0
  TX packets:880 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:1228273 (1.1 MiB)  TX bytes:68955 (67.3 KiB)

Before the test
RX packets:28 errors:0 dropped:0 overruns:0 frame:0

After the test
RX packets:865 errors:249 dropped:0 overruns:0 frame:0

Good test packets = 865 - 28 = 837
Detected bad Ethernet frames = 249

Bad to good ratio is 249:837 = 1:3.36 so 1 detected bad Ethernet frame 
per 3.36 good Ethernet frames



Your ifconfig output from http://paste.ubuntu.com/16236764/ shows 1282 
errors


eth0  Link encap:Ethernet  HWaddr 00:0e:c6:fa:bf:fd
  inet addr:192.168.1.11  Bcast:192.168.1.255 Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:55 errors:1282 dropped:0 overruns:0 frame:0
  TX packets:64 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:14287 (13.9 KiB)  TX bytes:7639 (7.4 KiB)

Before the test
RX packets:19 errors:0 dropped:0 overruns:0 frame:0

After the test
RX packets:55 errors:1282 dropped:0 overruns:0 frame:0

Good test packets = 55 - 19 = 36
Detected bad Ethernet frames = 1282

Bad to good ratio is 1282:36 = 1:0.28 so 1 detected bad Ethernet frame 
per 0.028 good Ethernet frames


This suggests a very high error rate.



It is interesting that the reported "remaining" value is 988. Is 988 always
shown ? I mean that do you see any other "remaining" values for the "Data
Header synchronisation was lost" error message ?

Yes and No. When doing iperf test in TCP mode, always 988. I have
never seen other "remaining" value.

But,
1. I tried "ping -f -s 1400 [my.arm.64bit.board.ip]", but this cannot
trigger the error.
2. Tried iperf in UDP mode, I saw "Data Header synchronisation was
lost" remaining value is 984 (again, seemingly always in several
tries). Log is pasted here. http://paste.ubuntu.com/16236764/


In http://paste.ubuntu.com/16236764/ you see very many
[   41.938370] asix 1-1.2:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x11400040, offset 4


but only a few
[   42.214607] asix 1-1.2:1.0 eth0: asix_rx_fixup() Data Header 
synchronisation was lost, remaining 984


This suggests that the "Bad Header Length" and "Data Header 
synchronisation was lost" error messages are not related to consecutive 
URBs. The expectation is that a "Data Header synchronisation was lost" 
error message is immediately followed by a "Bad Header Length" message 
with a timestamp much less than 1ms (for high speed USB). This is 
because an Ethernet frame that spans URBs needs low latency so should be 
sent quickly in consecutive URBs.


The Bad Header Length error messages with offset 4 indicates that 32-bit 
header word was not found in the expected location at the start of the 
URB buffer.
[   41.938370] asix 1-1.2:1.0 eth0: asix_rx_fixup() Bad Header 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread Dean Jenkins

On 05/05/16 13:19, Guodong Xu wrote:

Hi, Dean

I am not sure why do you insist 'not full speed'. Actually, the tests
I run on ARM-64bit is at USB full speed mode. I pasted my log here:
http://paste.ubuntu.com/16236442/
, which includes the information you requested above, ifconfig, dmesg.
The interval between two consecutive errors varies from 10 to 40ms.



Your log from http://paste.ubuntu.com/16236442/ shows high speed for 
device 3 is not being used:


[3.586968] usb 1-1: new full-speed USB device number 2 using dwc2
[3.792091] usb 1-1: not running at top speed; connect to a high 
speed hub

[3.800477] hub 1-1:1.0: USB hub found
[3.803658] hub 1-1:1.0: 3 ports detected
[4.086636] usb 1-1.2: new full-speed USB device number 3 using dwc2
[4.202209] usb 1-1.2: not running at top speed; connect to a high 
speed hub
[8.851236] asix 1-1.2:1.0 eth0: register 'asix' at 
usb-f72c.usb-1.2, ASIX AX88772B USB 2.0 Ethernet, 00:0e:c6:fa:bf:fd


Hopefully, you know USB 2.0 high speed (480Mbps) is faster than full 
speed (12Mbps) mode.


Therefore, your USB to Ethernet Adaptor is not running in its optimal 
"normal" high speed operation and there is a USB hub in the way that is 
not running at USB high speed mode. This is an abnormal configuration 
and potentially explains some of your failure observations.


Running at full-speed (12Mbps) mode would explain why the timestamps has 
gaps of ms rather than us gaps (for 480Mbps).


In other words, the full-speed hub is restricting the USB to Ethernet 
Adaptor to a 12Mbps (half-duplex) bandwidth to support Ethernet 100Mbps 
(full-duplex) traffic. That is not going to work very well because 
Ethernet frames (perhaps partial Ethernet frames) need to be discarded 
within the USB link.


Your ifconfig output from http://paste.ubuntu.com/16236442/ shows 249 errors

eth0  Link encap:Ethernet  HWaddr 00:0e:c6:fa:bf:fd
  inet addr:192.168.1.11  Bcast:192.168.1.255 Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:865 errors:249 dropped:0 overruns:0 frame:0
  TX packets:880 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:1228273 (1.1 MiB)  TX bytes:68955 (67.3 KiB)

Before the test
RX packets:28 errors:0 dropped:0 overruns:0 frame:0

After the test
RX packets:865 errors:249 dropped:0 overruns:0 frame:0

Good test packets = 865 - 28 = 837
Detected bad Ethernet frames = 249

Bad to good ratio is 249:837 = 1:3.36 so 1 detected bad Ethernet frame 
per 3.36 good Ethernet frames



Your ifconfig output from http://paste.ubuntu.com/16236764/ shows 1282 
errors


eth0  Link encap:Ethernet  HWaddr 00:0e:c6:fa:bf:fd
  inet addr:192.168.1.11  Bcast:192.168.1.255 Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:55 errors:1282 dropped:0 overruns:0 frame:0
  TX packets:64 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:14287 (13.9 KiB)  TX bytes:7639 (7.4 KiB)

Before the test
RX packets:19 errors:0 dropped:0 overruns:0 frame:0

After the test
RX packets:55 errors:1282 dropped:0 overruns:0 frame:0

Good test packets = 55 - 19 = 36
Detected bad Ethernet frames = 1282

Bad to good ratio is 1282:36 = 1:0.28 so 1 detected bad Ethernet frame 
per 0.028 good Ethernet frames


This suggests a very high error rate.



It is interesting that the reported "remaining" value is 988. Is 988 always
shown ? I mean that do you see any other "remaining" values for the "Data
Header synchronisation was lost" error message ?

Yes and No. When doing iperf test in TCP mode, always 988. I have
never seen other "remaining" value.

But,
1. I tried "ping -f -s 1400 [my.arm.64bit.board.ip]", but this cannot
trigger the error.
2. Tried iperf in UDP mode, I saw "Data Header synchronisation was
lost" remaining value is 984 (again, seemingly always in several
tries). Log is pasted here. http://paste.ubuntu.com/16236764/


In http://paste.ubuntu.com/16236764/ you see very many
[   41.938370] asix 1-1.2:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x11400040, offset 4


but only a few
[   42.214607] asix 1-1.2:1.0 eth0: asix_rx_fixup() Data Header 
synchronisation was lost, remaining 984


This suggests that the "Bad Header Length" and "Data Header 
synchronisation was lost" error messages are not related to consecutive 
URBs. The expectation is that a "Data Header synchronisation was lost" 
error message is immediately followed by a "Bad Header Length" message 
with a timestamp much less than 1ms (for high speed USB). This is 
because an Ethernet frame that spans URBs needs low latency so should be 
sent quickly in consecutive URBs.


The Bad Header Length error messages with offset 4 indicates that 32-bit 
header word was not found in the expected location at the start of the 
URB buffer.
[   41.938370] asix 1-1.2:1.0 eth0: asix_rx_fixup() Bad Header 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-05 Thread Guodong Xu
On 5 May 2016 at 16:11, Dean Jenkins  wrote:
> On 05/05/16 00:45, John Stultz wrote:
>>
>> On Tue, May 3, 2016 at 3:54 AM, Dean Jenkins 
>> wrote:
>>>
>>> On 03/05/16 11:04, Guodong Xu wrote:

 did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey
 is an ARM 64-bit system. I suggest we should be careful on that. I saw
 similar issues when transferring to a 64-bit system in other net
 drivers.
>>>
>>> We used 32-bit ARM and never tested on 64-bit ARM so I suggest that the
>>> commits need to be reviewed with 64-bit OS in mind.


 Do you have any suggestion on this regard?
>>>
>>> Try testing on a Linux PC x86 32-bit OS which has has a kernel containing
>>> my
>>> ASIX commits. This will help to confirm whether the failure is related to
>>> 32-bit or 64-bit OS. Then try with Linux PC x86 64-bit OS, this should
>>> fail
>>> otherwise it points to something specific in your ARM 64-bit platform.
>>
>> Just as a sample point, I have managed to reproduce exactly this issue
>> on an x86_64 system by simply scp'ing a large file.
>
> Please tell us the x86_64 kernel version number that you used and which
> Linux Distribution it was ? This allows other people a chance to reproduce
> your observations.
>
>>
>> [  417.819276] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
>> synchronisation was lost, remaining 988
>
> It is interesting that the reported "remaining" value is 988. Is 988 always
> shown ? I mean that do you see any other "remaining" values for the "Data
> Header synchronisation was lost" error message ?
>
>> [  417.823415] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0xef830347, offset 4
>
> The gap in the timestamps shows 417.823415 - 417.819276 = 0.004139 = 4ms
> which is a large gap in terms of USB 2.0 high speed communications. This gap
> is expected to be in the 100us range for consecutive URBs. So 4ms is
> strange.
>
> The expectation is that the "Data Header synchronisation was lost" error
> message resets the 32-bit header word synchronisation to the start of the
> next URB buffer. The "Bad Header Length, offset 4" is the expected outcome
> for the next URB because it is unlikely the 32-bit header word is at the
> start of URB buffer due to Ethernet frames spanning across URBs.
>>
>> [  417.827502] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0x31e2b348, offset 4
>
> Timestamps show the gap to be 4ms which is strange for USB 2.0 high speed,
> are you sure high speed mode is being used ?
>>
>> [  417.843779] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
>> synchronisation was lost, remaining 988
>> [  417.847921] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0x8af91035, offset 4
>> [  417.852004] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0x8521fa03, offset 4
>> [  418.273394] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
>> synchronisation was lost, remaining 988
>> [  418.277532] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0x33cd9c7c, offset 4
>> [  418.281683] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0x3d850896, offset 4
>> [  418.286227] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0x86443357, offset 4
>> [  418.290319] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0xee6c81d1, offset 4
>>
>> I don't have any 32bit x86 installs around so I'm not sure I can easly
>> test there, but its clear its not arm64 specific.
>
> I agree the issue is not specific to your ARM 64 bit platform.
>
> Please can you supply the output of ifconfig for the USB to Ethernet
> adaptor, your example above shows eth1 as the device.
>
> Please show the output of ifconfig eth1 before and after the issue is seen.
> This will show us whether the kernel logs any network errors and how many
> bytes have been transferred.
>
> After the issue is seen, please can you show us the output of "dmesg | grep
> asix" so that we can see status messages from the ASIX driver that the USB
> to Ethernet adaptor is using. In particular we need to check that USB high
> speed operation (480Mbps) is being used and not full speed operation
> (12Mbps).

Hi, Dean

I am not sure why do you insist 'not full speed'. Actually, the tests
I run on ARM-64bit is at USB full speed mode. I pasted my log here:
http://paste.ubuntu.com/16236442/
, which includes the information you requested above, ifconfig, dmesg.
The interval between two consecutive errors varies from 10 to 40ms.

> It is interesting that the reported "remaining" value is 988. Is 988 always
> shown ? I mean that do you see any other "remaining" values for the "Data
> Header synchronisation was lost" error message ?

Yes and No. When doing iperf test in TCP mode, always 988. I have
never seen other "remaining" value.

But,
1. I tried "ping -f -s 1400 [my.arm.64bit.board.ip]", but this cannot
trigger the error.
2. Tried iperf in UDP mode, I saw "Data Header synchronisation was
lost" remaining value is 984 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-05 Thread Guodong Xu
On 5 May 2016 at 16:11, Dean Jenkins  wrote:
> On 05/05/16 00:45, John Stultz wrote:
>>
>> On Tue, May 3, 2016 at 3:54 AM, Dean Jenkins 
>> wrote:
>>>
>>> On 03/05/16 11:04, Guodong Xu wrote:

 did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey
 is an ARM 64-bit system. I suggest we should be careful on that. I saw
 similar issues when transferring to a 64-bit system in other net
 drivers.
>>>
>>> We used 32-bit ARM and never tested on 64-bit ARM so I suggest that the
>>> commits need to be reviewed with 64-bit OS in mind.


 Do you have any suggestion on this regard?
>>>
>>> Try testing on a Linux PC x86 32-bit OS which has has a kernel containing
>>> my
>>> ASIX commits. This will help to confirm whether the failure is related to
>>> 32-bit or 64-bit OS. Then try with Linux PC x86 64-bit OS, this should
>>> fail
>>> otherwise it points to something specific in your ARM 64-bit platform.
>>
>> Just as a sample point, I have managed to reproduce exactly this issue
>> on an x86_64 system by simply scp'ing a large file.
>
> Please tell us the x86_64 kernel version number that you used and which
> Linux Distribution it was ? This allows other people a chance to reproduce
> your observations.
>
>>
>> [  417.819276] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
>> synchronisation was lost, remaining 988
>
> It is interesting that the reported "remaining" value is 988. Is 988 always
> shown ? I mean that do you see any other "remaining" values for the "Data
> Header synchronisation was lost" error message ?
>
>> [  417.823415] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0xef830347, offset 4
>
> The gap in the timestamps shows 417.823415 - 417.819276 = 0.004139 = 4ms
> which is a large gap in terms of USB 2.0 high speed communications. This gap
> is expected to be in the 100us range for consecutive URBs. So 4ms is
> strange.
>
> The expectation is that the "Data Header synchronisation was lost" error
> message resets the 32-bit header word synchronisation to the start of the
> next URB buffer. The "Bad Header Length, offset 4" is the expected outcome
> for the next URB because it is unlikely the 32-bit header word is at the
> start of URB buffer due to Ethernet frames spanning across URBs.
>>
>> [  417.827502] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0x31e2b348, offset 4
>
> Timestamps show the gap to be 4ms which is strange for USB 2.0 high speed,
> are you sure high speed mode is being used ?
>>
>> [  417.843779] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
>> synchronisation was lost, remaining 988
>> [  417.847921] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0x8af91035, offset 4
>> [  417.852004] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0x8521fa03, offset 4
>> [  418.273394] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
>> synchronisation was lost, remaining 988
>> [  418.277532] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0x33cd9c7c, offset 4
>> [  418.281683] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0x3d850896, offset 4
>> [  418.286227] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0x86443357, offset 4
>> [  418.290319] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0xee6c81d1, offset 4
>>
>> I don't have any 32bit x86 installs around so I'm not sure I can easly
>> test there, but its clear its not arm64 specific.
>
> I agree the issue is not specific to your ARM 64 bit platform.
>
> Please can you supply the output of ifconfig for the USB to Ethernet
> adaptor, your example above shows eth1 as the device.
>
> Please show the output of ifconfig eth1 before and after the issue is seen.
> This will show us whether the kernel logs any network errors and how many
> bytes have been transferred.
>
> After the issue is seen, please can you show us the output of "dmesg | grep
> asix" so that we can see status messages from the ASIX driver that the USB
> to Ethernet adaptor is using. In particular we need to check that USB high
> speed operation (480Mbps) is being used and not full speed operation
> (12Mbps).

Hi, Dean

I am not sure why do you insist 'not full speed'. Actually, the tests
I run on ARM-64bit is at USB full speed mode. I pasted my log here:
http://paste.ubuntu.com/16236442/
, which includes the information you requested above, ifconfig, dmesg.
The interval between two consecutive errors varies from 10 to 40ms.

> It is interesting that the reported "remaining" value is 988. Is 988 always
> shown ? I mean that do you see any other "remaining" values for the "Data
> Header synchronisation was lost" error message ?

Yes and No. When doing iperf test in TCP mode, always 988. I have
never seen other "remaining" value.

But,
1. I tried "ping -f -s 1400 [my.arm.64bit.board.ip]", but this cannot
trigger the error.
2. Tried iperf in UDP mode, I saw "Data Header synchronisation was
lost" remaining value is 984 (again, seemingly always in several
tries). Log 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-05 Thread Dean Jenkins

On 05/05/16 00:45, John Stultz wrote:

On Tue, May 3, 2016 at 3:54 AM, Dean Jenkins  wrote:

On 03/05/16 11:04, Guodong Xu wrote:

did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey
is an ARM 64-bit system. I suggest we should be careful on that. I saw
similar issues when transferring to a 64-bit system in other net
drivers.

We used 32-bit ARM and never tested on 64-bit ARM so I suggest that the
commits need to be reviewed with 64-bit OS in mind.


Do you have any suggestion on this regard?

Try testing on a Linux PC x86 32-bit OS which has has a kernel containing my
ASIX commits. This will help to confirm whether the failure is related to
32-bit or 64-bit OS. Then try with Linux PC x86 64-bit OS, this should fail
otherwise it points to something specific in your ARM 64-bit platform.

Just as a sample point, I have managed to reproduce exactly this issue
on an x86_64 system by simply scp'ing a large file.
Please tell us the x86_64 kernel version number that you used and which 
Linux Distribution it was ? This allows other people a chance to 
reproduce your observations.




[  417.819276] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
It is interesting that the reported "remaining" value is 988. Is 988 
always shown ? I mean that do you see any other "remaining" values for 
the "Data Header synchronisation was lost" error message ?



[  417.823415] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0xef830347, offset 4
The gap in the timestamps shows 417.823415 - 417.819276 = 0.004139 = 4ms 
which is a large gap in terms of USB 2.0 high speed communications. This 
gap is expected to be in the 100us range for consecutive URBs. So 4ms is 
strange.


The expectation is that the "Data Header synchronisation was lost" error 
message resets the 32-bit header word synchronisation to the start of 
the next URB buffer. The "Bad Header Length, offset 4" is the expected 
outcome for the next URB because it is unlikely the 32-bit header word 
is at the start of URB buffer due to Ethernet frames spanning across URBs.

[  417.827502] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x31e2b348, offset 4
Timestamps show the gap to be 4ms which is strange for USB 2.0 high 
speed, are you sure high speed mode is being used ?

[  417.843779] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  417.847921] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x8af91035, offset 4
[  417.852004] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x8521fa03, offset 4
[  418.273394] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  418.277532] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x33cd9c7c, offset 4
[  418.281683] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x3d850896, offset 4
[  418.286227] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x86443357, offset 4
[  418.290319] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0xee6c81d1, offset 4

I don't have any 32bit x86 installs around so I'm not sure I can easly
test there, but its clear its not arm64 specific.

I agree the issue is not specific to your ARM 64 bit platform.

Please can you supply the output of ifconfig for the USB to Ethernet 
adaptor, your example above shows eth1 as the device.


Please show the output of ifconfig eth1 before and after the issue is 
seen. This will show us whether the kernel logs any network errors and 
how many bytes have been transferred.


After the issue is seen, please can you show us the output of "dmesg | 
grep asix" so that we can see status messages from the ASIX driver that 
the USB to Ethernet adaptor is using. In particular we need to check 
that USB high speed operation (480Mbps) is being used and not full speed 
operation (12Mbps).


Thanks,

Regards,
Dean



thanks
-john



--
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.



Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-05 Thread Dean Jenkins

On 05/05/16 00:45, John Stultz wrote:

On Tue, May 3, 2016 at 3:54 AM, Dean Jenkins  wrote:

On 03/05/16 11:04, Guodong Xu wrote:

did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey
is an ARM 64-bit system. I suggest we should be careful on that. I saw
similar issues when transferring to a 64-bit system in other net
drivers.

We used 32-bit ARM and never tested on 64-bit ARM so I suggest that the
commits need to be reviewed with 64-bit OS in mind.


Do you have any suggestion on this regard?

Try testing on a Linux PC x86 32-bit OS which has has a kernel containing my
ASIX commits. This will help to confirm whether the failure is related to
32-bit or 64-bit OS. Then try with Linux PC x86 64-bit OS, this should fail
otherwise it points to something specific in your ARM 64-bit platform.

Just as a sample point, I have managed to reproduce exactly this issue
on an x86_64 system by simply scp'ing a large file.
Please tell us the x86_64 kernel version number that you used and which 
Linux Distribution it was ? This allows other people a chance to 
reproduce your observations.




[  417.819276] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
It is interesting that the reported "remaining" value is 988. Is 988 
always shown ? I mean that do you see any other "remaining" values for 
the "Data Header synchronisation was lost" error message ?



[  417.823415] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0xef830347, offset 4
The gap in the timestamps shows 417.823415 - 417.819276 = 0.004139 = 4ms 
which is a large gap in terms of USB 2.0 high speed communications. This 
gap is expected to be in the 100us range for consecutive URBs. So 4ms is 
strange.


The expectation is that the "Data Header synchronisation was lost" error 
message resets the 32-bit header word synchronisation to the start of 
the next URB buffer. The "Bad Header Length, offset 4" is the expected 
outcome for the next URB because it is unlikely the 32-bit header word 
is at the start of URB buffer due to Ethernet frames spanning across URBs.

[  417.827502] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x31e2b348, offset 4
Timestamps show the gap to be 4ms which is strange for USB 2.0 high 
speed, are you sure high speed mode is being used ?

[  417.843779] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  417.847921] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x8af91035, offset 4
[  417.852004] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x8521fa03, offset 4
[  418.273394] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  418.277532] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x33cd9c7c, offset 4
[  418.281683] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x3d850896, offset 4
[  418.286227] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x86443357, offset 4
[  418.290319] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0xee6c81d1, offset 4

I don't have any 32bit x86 installs around so I'm not sure I can easly
test there, but its clear its not arm64 specific.

I agree the issue is not specific to your ARM 64 bit platform.

Please can you supply the output of ifconfig for the USB to Ethernet 
adaptor, your example above shows eth1 as the device.


Please show the output of ifconfig eth1 before and after the issue is 
seen. This will show us whether the kernel logs any network errors and 
how many bytes have been transferred.


After the issue is seen, please can you show us the output of "dmesg | 
grep asix" so that we can see status messages from the ASIX driver that 
the USB to Ethernet adaptor is using. In particular we need to check 
that USB high speed operation (480Mbps) is being used and not full speed 
operation (12Mbps).


Thanks,

Regards,
Dean



thanks
-john



--
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.



Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-04 Thread John Stultz
On Tue, May 3, 2016 at 3:54 AM, Dean Jenkins  wrote:
> On 03/05/16 11:04, Guodong Xu wrote:
>>
>> did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey
>> is an ARM 64-bit system. I suggest we should be careful on that. I saw
>> similar issues when transferring to a 64-bit system in other net
>> drivers.
>
> We used 32-bit ARM and never tested on 64-bit ARM so I suggest that the
> commits need to be reviewed with 64-bit OS in mind.
>>
>>
>> Do you have any suggestion on this regard?
>
> Try testing on a Linux PC x86 32-bit OS which has has a kernel containing my
> ASIX commits. This will help to confirm whether the failure is related to
> 32-bit or 64-bit OS. Then try with Linux PC x86 64-bit OS, this should fail
> otherwise it points to something specific in your ARM 64-bit platform.

Just as a sample point, I have managed to reproduce exactly this issue
on an x86_64 system by simply scp'ing a large file.

[  417.819276] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  417.823415] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0xef830347, offset 4
[  417.827502] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x31e2b348, offset 4
[  417.843779] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  417.847921] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x8af91035, offset 4
[  417.852004] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x8521fa03, offset 4
[  418.273394] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  418.277532] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x33cd9c7c, offset 4
[  418.281683] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x3d850896, offset 4
[  418.286227] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x86443357, offset 4
[  418.290319] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0xee6c81d1, offset 4

I don't have any 32bit x86 installs around so I'm not sure I can easly
test there, but its clear its not arm64 specific.

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-04 Thread John Stultz
On Tue, May 3, 2016 at 3:54 AM, Dean Jenkins  wrote:
> On 03/05/16 11:04, Guodong Xu wrote:
>>
>> did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey
>> is an ARM 64-bit system. I suggest we should be careful on that. I saw
>> similar issues when transferring to a 64-bit system in other net
>> drivers.
>
> We used 32-bit ARM and never tested on 64-bit ARM so I suggest that the
> commits need to be reviewed with 64-bit OS in mind.
>>
>>
>> Do you have any suggestion on this regard?
>
> Try testing on a Linux PC x86 32-bit OS which has has a kernel containing my
> ASIX commits. This will help to confirm whether the failure is related to
> 32-bit or 64-bit OS. Then try with Linux PC x86 64-bit OS, this should fail
> otherwise it points to something specific in your ARM 64-bit platform.

Just as a sample point, I have managed to reproduce exactly this issue
on an x86_64 system by simply scp'ing a large file.

[  417.819276] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  417.823415] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0xef830347, offset 4
[  417.827502] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x31e2b348, offset 4
[  417.843779] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  417.847921] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x8af91035, offset 4
[  417.852004] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x8521fa03, offset 4
[  418.273394] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  418.277532] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x33cd9c7c, offset 4
[  418.281683] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x3d850896, offset 4
[  418.286227] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x86443357, offset 4
[  418.290319] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0xee6c81d1, offset 4

I don't have any 32bit x86 installs around so I'm not sure I can easly
test there, but its clear its not arm64 specific.

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-04 Thread Dean Jenkins

On 04/05/16 01:28, David B. Robins wrote:




Here is the code snippet from the patch with my annotations between #
#, I will try to explain my intentions. Feel free to point out any
flaws:

if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) {
# Only runs when rx->remaining !=0 and the end of the Ethernet
frame + next 32-bit header word is within the URB buffer. #
# Therefore, this code does not run when the end of an
Ethernet frame has been reached in the previous URB #
# or when the end of the Ethernet frame + next 32-bit header
word will be in a later URB buffer #


It may well be. I don't have the setup with me now, but I can try 
tomorrow to reproduce an environment where I can add some more 
detailed logging.


Since the URB length has to be >= than the remaining data plus a u32, 
the devices that John Stultz and I are using (AX88772B in my case) may 
be adding some additional data/padding after an Ethernet frame, 
expecting it to be discarded, and running into this check and its 
consequences. This may mean the device is badly behaved, if it is 
specified not to send anything extra; in any case, a well-intentioned 
error correction has gone badly, but I better understand the intent 
now. I am curious to know how often the device you are using benefits 
from this block of code.


The issue is that the driver should be robust to cope with missing URBs. 
Whilst testing with D-Link DUB-E100 C1 AX88772 USB to Ethernet adaptor 
in our ARM embedded system which runs in hostile environments, it was 
noticed that URBs could be lost (probably due to a bug elsewhere or low 
memory issue). Without this patch, a missing URB causes bad Ethernet 
frames to be passed up to the IP stack because rx->remaining spans 
multiple URBs.


In the good case of an Ethernet frame spanning 2 URBs, the 1st URB is 
processed and copies the 1st part of the Ethernet frame into the netdev 
buffer, for the 2nd URB the remaining part of the Ethernet frame is 
copied into the same netdev buffer to complete the Ethernet frame. The 
netdev buffer is then sent up to the IP stack.


In the case of a missing URB, a bad Ethernet frame is created as follows:
The 1st URB is processed and copies the 1st part of the Ethernet frame 
into the netdev buffer, the 2nd URB is lost (somehow),  the 3rd URB is 
processed and blindly copies what it thinks is the remaining part of the 
Ethernet frame in the same netdev buffer which corrupts the Ethernet 
frame. The netdev buffer is then sent up to the IP stack. The 3rd URB 
and subsequent URBs are processed but synchronisation has been lost so 
can misread data as a 32-bit header word. It is likely that some good 
Ethernet frames get discarded whilst trying to resynchronise.


A recovery strategy for regaining lock with the 32-bit header word is 
necessary otherwise the driver will have difficulty in recovering from a 
lost URB.


In the "olden days", the 32-bit header word was always at the start of 
the URB buffer so previous URBs did not influence the current URB. So no 
recovery strategy was needed at that time. But now we have to remember 
what happened in the previous URB and a lost URB can cause a 
discontinuity in the data stream because the data is not always aligned 
to the start of the URB buffer.


I agree that your environment may never suffer from lost URBs so removal 
of the patch would work OK.


I will try to find some time to setup a test environment.

Regards,
Dean

--
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.



Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-04 Thread Dean Jenkins

On 04/05/16 01:28, David B. Robins wrote:




Here is the code snippet from the patch with my annotations between #
#, I will try to explain my intentions. Feel free to point out any
flaws:

if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) {
# Only runs when rx->remaining !=0 and the end of the Ethernet
frame + next 32-bit header word is within the URB buffer. #
# Therefore, this code does not run when the end of an
Ethernet frame has been reached in the previous URB #
# or when the end of the Ethernet frame + next 32-bit header
word will be in a later URB buffer #


It may well be. I don't have the setup with me now, but I can try 
tomorrow to reproduce an environment where I can add some more 
detailed logging.


Since the URB length has to be >= than the remaining data plus a u32, 
the devices that John Stultz and I are using (AX88772B in my case) may 
be adding some additional data/padding after an Ethernet frame, 
expecting it to be discarded, and running into this check and its 
consequences. This may mean the device is badly behaved, if it is 
specified not to send anything extra; in any case, a well-intentioned 
error correction has gone badly, but I better understand the intent 
now. I am curious to know how often the device you are using benefits 
from this block of code.


The issue is that the driver should be robust to cope with missing URBs. 
Whilst testing with D-Link DUB-E100 C1 AX88772 USB to Ethernet adaptor 
in our ARM embedded system which runs in hostile environments, it was 
noticed that URBs could be lost (probably due to a bug elsewhere or low 
memory issue). Without this patch, a missing URB causes bad Ethernet 
frames to be passed up to the IP stack because rx->remaining spans 
multiple URBs.


In the good case of an Ethernet frame spanning 2 URBs, the 1st URB is 
processed and copies the 1st part of the Ethernet frame into the netdev 
buffer, for the 2nd URB the remaining part of the Ethernet frame is 
copied into the same netdev buffer to complete the Ethernet frame. The 
netdev buffer is then sent up to the IP stack.


In the case of a missing URB, a bad Ethernet frame is created as follows:
The 1st URB is processed and copies the 1st part of the Ethernet frame 
into the netdev buffer, the 2nd URB is lost (somehow),  the 3rd URB is 
processed and blindly copies what it thinks is the remaining part of the 
Ethernet frame in the same netdev buffer which corrupts the Ethernet 
frame. The netdev buffer is then sent up to the IP stack. The 3rd URB 
and subsequent URBs are processed but synchronisation has been lost so 
can misread data as a 32-bit header word. It is likely that some good 
Ethernet frames get discarded whilst trying to resynchronise.


A recovery strategy for regaining lock with the 32-bit header word is 
necessary otherwise the driver will have difficulty in recovering from a 
lost URB.


In the "olden days", the 32-bit header word was always at the start of 
the URB buffer so previous URBs did not influence the current URB. So no 
recovery strategy was needed at that time. But now we have to remember 
what happened in the previous URB and a lost URB can cause a 
discontinuity in the data stream because the data is not always aligned 
to the start of the URB buffer.


I agree that your environment may never suffer from lost URBs so removal 
of the patch would work OK.


I will try to find some time to setup a test environment.

Regards,
Dean

--
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.



Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread David B. Robins

On 2016-05-03 17:16, Dean Jenkins wrote:

On 03/05/16 15:42, David B. Robins wrote:


I don't think the first one is giving you problems (except as 
triggered by the second) but I had concerns about the second myself 
(and emailed the author off-list, but received no reply), and we did 
not take that commit for our own product.



Sorry, I might have missed your original E-mail.

Specifically, the second change, 3f30... (original patch: 
https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1) 
appears to do the exact opposite of what it claims, i.e., instead of 
"resync if this looks like a header", it does "resync if this does NOT 
look like a (packet) header", where "looks like a header" means "bits 
0-10 (size) are equal to the bitwise-NOT of bits 16-26", and (2) can 
happen by coincidence for 1/2048 32-bit values starting a continuation 
URB (easy to hit dealing with large volumes of video data as we were). 
It appears to expect the header for every URB whereas the rest of the 
code at least expects it only once per network packet (look at 
following code that only reads it for remaining == 0).


David, I think that your interpretation is incorrect. Please see below.

Here is the code snippet from the patch with my annotations between #
#, I will try to explain my intentions. Feel free to point out any
flaws:

if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) {
# Only runs when rx->remaining !=0 and the end of the Ethernet
frame + next 32-bit header word is within the URB buffer. #
# Therefore, this code does not run when the end of an
Ethernet frame has been reached in the previous URB #
# or when the end of the Ethernet frame + next 32-bit header
word will be in a later URB buffer #


It may well be. I don't have the setup with me now, but I can try 
tomorrow to reproduce an environment where I can add some more detailed 
logging.


Since the URB length has to be >= than the remaining data plus a u32, 
the devices that John Stultz and I are using (AX88772B in my case) may 
be adding some additional data/padding after an Ethernet frame, 
expecting it to be discarded, and running into this check and its 
consequences. This may mean the device is badly behaved, if it is 
specified not to send anything extra; in any case, a well-intentioned 
error correction has gone badly, but I better understand the intent now. 
I am curious to know how often the device you are using benefits from 
this block of code.



Regards,
Dean


David


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread David B. Robins

On 2016-05-03 17:16, Dean Jenkins wrote:

On 03/05/16 15:42, David B. Robins wrote:


I don't think the first one is giving you problems (except as 
triggered by the second) but I had concerns about the second myself 
(and emailed the author off-list, but received no reply), and we did 
not take that commit for our own product.



Sorry, I might have missed your original E-mail.

Specifically, the second change, 3f30... (original patch: 
https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1) 
appears to do the exact opposite of what it claims, i.e., instead of 
"resync if this looks like a header", it does "resync if this does NOT 
look like a (packet) header", where "looks like a header" means "bits 
0-10 (size) are equal to the bitwise-NOT of bits 16-26", and (2) can 
happen by coincidence for 1/2048 32-bit values starting a continuation 
URB (easy to hit dealing with large volumes of video data as we were). 
It appears to expect the header for every URB whereas the rest of the 
code at least expects it only once per network packet (look at 
following code that only reads it for remaining == 0).


David, I think that your interpretation is incorrect. Please see below.

Here is the code snippet from the patch with my annotations between #
#, I will try to explain my intentions. Feel free to point out any
flaws:

if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) {
# Only runs when rx->remaining !=0 and the end of the Ethernet
frame + next 32-bit header word is within the URB buffer. #
# Therefore, this code does not run when the end of an
Ethernet frame has been reached in the previous URB #
# or when the end of the Ethernet frame + next 32-bit header
word will be in a later URB buffer #


It may well be. I don't have the setup with me now, but I can try 
tomorrow to reproduce an environment where I can add some more detailed 
logging.


Since the URB length has to be >= than the remaining data plus a u32, 
the devices that John Stultz and I are using (AX88772B in my case) may 
be adding some additional data/padding after an Ethernet frame, 
expecting it to be discarded, and running into this check and its 
consequences. This may mean the device is badly behaved, if it is 
specified not to send anything extra; in any case, a well-intentioned 
error correction has gone badly, but I better understand the intent now. 
I am curious to know how often the device you are using benefits from 
this block of code.



Regards,
Dean


David


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread Dean Jenkins

On 03/05/16 15:42, David B. Robins wrote:


I don't think the first one is giving you problems (except as 
triggered by the second) but I had concerns about the second myself 
(and emailed the author off-list, but received no reply), and we did 
not take that commit for our own product.



Sorry, I might have missed your original E-mail.

Specifically, the second change, 3f30... (original patch: 
https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1) 
appears to do the exact opposite of what it claims, i.e., instead of 
"resync if this looks like a header", it does "resync if this does NOT 
look like a (packet) header", where "looks like a header" means "bits 
0-10 (size) are equal to the bitwise-NOT of bits 16-26", and (2) can 
happen by coincidence for 1/2048 32-bit values starting a continuation 
URB (easy to hit dealing with large volumes of video data as we were). 
It appears to expect the header for every URB whereas the rest of the 
code at least expects it only once per network packet (look at 
following code that only reads it for remaining == 0).


David, I think that your interpretation is incorrect. Please see below.

Here is the code snippet from the patch with my annotations between # #, 
I will try to explain my intentions. Feel free to point out any flaws:


if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) {
# Only runs when rx->remaining !=0 and the end of the Ethernet 
frame + next 32-bit header word is within the URB buffer. #
# Therefore, this code does not run when the end of an Ethernet 
frame has been reached in the previous URB #
# or when the end of the Ethernet frame + next 32-bit header 
word will be in a later URB buffer #


# offset is an index to the expected next 32-bit header word 
after the end of the Ethernet frame #

offset = ((rx->remaining + 1) & 0xfffe) + sizeof(u32);

# rx->header contains the expected 32-bit header value 
corrected for Endianness and alignment #

rx->header = get_unaligned_le32(skb->data + offset);
offset = 0;

# check the data integrity of the size value from the header word #
size = (u16)(rx->header & 0x7ff);
# if the size value fails the integrity check then we are not 
looking at a valid header word so #

# synchronisation has been lost #
if (size != ((~rx->header >> 16) & 0x7ff)) {
netdev_err(dev->net, "asix_rx_fixup() Data Header 
synchronisation was lost, remaining %d\n",

   rx->remaining);
if (rx->ax_skb) {
kfree_skb(rx->ax_skb);
rx->ax_skb = NULL;
/* Discard the incomplete netdev Ethernet frame
 * and assume the Data header is at the start of
 * the current URB socket buffer.
 */
}
rx->remaining = 0;
}
}




So that change made no sense to me, but I don't have significant 
kernel dev experience. Effectively it will drop/truncate every 
(2047/2048) split (longer than an URB) packet, and report an error for 
the second URB and then again for treating said second URB as a first 
URB for a packet. I would expect your problems will go away just 
removing the second change. You could also change the != to == in "if 
(size != ...)" but then you'd still have 1/2048 (depending on data 
patterns) false positives.
The code only runs when the Ethernet frame spans across URBs and is 
checking that the next 32-bit header word is present and valid.


Upon loss of synchronisation, the strategy is to assume that the 32-bit 
header is at the start of the URB buffer. Obviously, that might not be 
true every time but it is the most likely location especially when 
Ethernet frames are not spanning URBs at that point at time.


Looking at the error messages:


[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x54ebb5ec, offset 4 


The offset 4 means that the 32-bit header word was invalid at the start 
of the URB buffer. This could be a consequence of data synchronisation 
being lost however, we would expect the timestamps between the error 
messages of "synchronisation was lost" and "Bad Header Length" to very 
close as they would be consecutive URBs. The evidence is showing 10ms 
gaps which does not suggest consecutive URBs. In other words, an 
Ethernet frame should not be spanned over a time gap of 10ms as that 
would be very inefficient. If that were true then there would be USB 
communications problem with the USB to Ethernet adaptor which I hope is 
not true.


[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988

This error message consistently shows the remaining value to be 988, at 
least for the 3 examples provided by John. This does not suggest a 
random failure unless there are other examples of a non 988 remaining 
value error message. 988 is well within a Ethernet frame 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread Dean Jenkins

On 03/05/16 15:42, David B. Robins wrote:


I don't think the first one is giving you problems (except as 
triggered by the second) but I had concerns about the second myself 
(and emailed the author off-list, but received no reply), and we did 
not take that commit for our own product.



Sorry, I might have missed your original E-mail.

Specifically, the second change, 3f30... (original patch: 
https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1) 
appears to do the exact opposite of what it claims, i.e., instead of 
"resync if this looks like a header", it does "resync if this does NOT 
look like a (packet) header", where "looks like a header" means "bits 
0-10 (size) are equal to the bitwise-NOT of bits 16-26", and (2) can 
happen by coincidence for 1/2048 32-bit values starting a continuation 
URB (easy to hit dealing with large volumes of video data as we were). 
It appears to expect the header for every URB whereas the rest of the 
code at least expects it only once per network packet (look at 
following code that only reads it for remaining == 0).


David, I think that your interpretation is incorrect. Please see below.

Here is the code snippet from the patch with my annotations between # #, 
I will try to explain my intentions. Feel free to point out any flaws:


if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) {
# Only runs when rx->remaining !=0 and the end of the Ethernet 
frame + next 32-bit header word is within the URB buffer. #
# Therefore, this code does not run when the end of an Ethernet 
frame has been reached in the previous URB #
# or when the end of the Ethernet frame + next 32-bit header 
word will be in a later URB buffer #


# offset is an index to the expected next 32-bit header word 
after the end of the Ethernet frame #

offset = ((rx->remaining + 1) & 0xfffe) + sizeof(u32);

# rx->header contains the expected 32-bit header value 
corrected for Endianness and alignment #

rx->header = get_unaligned_le32(skb->data + offset);
offset = 0;

# check the data integrity of the size value from the header word #
size = (u16)(rx->header & 0x7ff);
# if the size value fails the integrity check then we are not 
looking at a valid header word so #

# synchronisation has been lost #
if (size != ((~rx->header >> 16) & 0x7ff)) {
netdev_err(dev->net, "asix_rx_fixup() Data Header 
synchronisation was lost, remaining %d\n",

   rx->remaining);
if (rx->ax_skb) {
kfree_skb(rx->ax_skb);
rx->ax_skb = NULL;
/* Discard the incomplete netdev Ethernet frame
 * and assume the Data header is at the start of
 * the current URB socket buffer.
 */
}
rx->remaining = 0;
}
}




So that change made no sense to me, but I don't have significant 
kernel dev experience. Effectively it will drop/truncate every 
(2047/2048) split (longer than an URB) packet, and report an error for 
the second URB and then again for treating said second URB as a first 
URB for a packet. I would expect your problems will go away just 
removing the second change. You could also change the != to == in "if 
(size != ...)" but then you'd still have 1/2048 (depending on data 
patterns) false positives.
The code only runs when the Ethernet frame spans across URBs and is 
checking that the next 32-bit header word is present and valid.


Upon loss of synchronisation, the strategy is to assume that the 32-bit 
header is at the start of the URB buffer. Obviously, that might not be 
true every time but it is the most likely location especially when 
Ethernet frames are not spanning URBs at that point at time.


Looking at the error messages:


[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x54ebb5ec, offset 4 


The offset 4 means that the 32-bit header word was invalid at the start 
of the URB buffer. This could be a consequence of data synchronisation 
being lost however, we would expect the timestamps between the error 
messages of "synchronisation was lost" and "Bad Header Length" to very 
close as they would be consecutive URBs. The evidence is showing 10ms 
gaps which does not suggest consecutive URBs. In other words, an 
Ethernet frame should not be spanned over a time gap of 10ms as that 
would be very inefficient. If that were true then there would be USB 
communications problem with the USB to Ethernet adaptor which I hope is 
not true.


[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988

This error message consistently shows the remaining value to be 988, at 
least for the 3 examples provided by John. This does not suggest a 
random failure unless there are other examples of a non 988 remaining 
value error message. 988 is well within a Ethernet frame 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread John Stultz
On Tue, May 3, 2016 at 7:42 AM, David B. Robins  wrote:
> On 2016-05-03 00:55, John Stultz wrote:
>>
>> Looking through the commits since the v4.1 kernel where we didn't see
>> this, I narrowed the regression down, and reverting the following two
>> commits seems to avoid the problem:
>>
>> 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
>> if no RX netdev buffer
>> 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
>> bad Ethernet frames
>>
>
> I don't think the first one is giving you problems (except as triggered by
> the second) but I had concerns about the second myself (and emailed the
> author off-list, but received no reply), and we did not take that commit for
> our own product.

Yes, the first/later commit is being reverted as it modifies code that
was also changed by the second/earlier one. So the 3f30 patch doesn't
revert cleanly by itself, but I have tested by just removing the (now
modified by 6a57) chunk of code it adds does seem to avoid the problem
as well. Though I wasn't able to review things closely enough to be
confident that didn't introduce any subtle bugs in the remaining logic
in the 6a57 patch.

> Specifically, the second change, 3f30... (original patch:
> https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1)
> appears to do the exact opposite of what it claims, i.e., instead of "resync
> if this looks like a header", it does "resync if this does NOT look like a
> (packet) header", where "looks like a header" means "bits 0-10 (size) are
> equal to the bitwise-NOT of bits 16-26", and (2) can happen by coincidence
> for 1/2048 32-bit values starting a continuation URB (easy to hit dealing
> with large volumes of video data as we were). It appears to expect the
> header for every URB whereas the rest of the code at least expects it only
> once per network packet (look at following code that only reads it for
> remaining == 0).
>
> So that change made no sense to me, but I don't have significant kernel dev
> experience. Effectively it will drop/truncate every (2047/2048) split
> (longer than an URB) packet, and report an error for the second URB and then
> again for treating said second URB as a first URB for a packet. I would
> expect your problems will go away just removing the second change. You could
> also change the != to == in "if (size != ...)" but then you'd still have
> 1/2048 (depending on data patterns) false positives.

I'll have to look into this more. I'm not super familiar with usb or
networking, so I'm not sure I can judge the better approach.

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread John Stultz
On Tue, May 3, 2016 at 7:42 AM, David B. Robins  wrote:
> On 2016-05-03 00:55, John Stultz wrote:
>>
>> Looking through the commits since the v4.1 kernel where we didn't see
>> this, I narrowed the regression down, and reverting the following two
>> commits seems to avoid the problem:
>>
>> 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
>> if no RX netdev buffer
>> 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
>> bad Ethernet frames
>>
>
> I don't think the first one is giving you problems (except as triggered by
> the second) but I had concerns about the second myself (and emailed the
> author off-list, but received no reply), and we did not take that commit for
> our own product.

Yes, the first/later commit is being reverted as it modifies code that
was also changed by the second/earlier one. So the 3f30 patch doesn't
revert cleanly by itself, but I have tested by just removing the (now
modified by 6a57) chunk of code it adds does seem to avoid the problem
as well. Though I wasn't able to review things closely enough to be
confident that didn't introduce any subtle bugs in the remaining logic
in the 6a57 patch.

> Specifically, the second change, 3f30... (original patch:
> https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1)
> appears to do the exact opposite of what it claims, i.e., instead of "resync
> if this looks like a header", it does "resync if this does NOT look like a
> (packet) header", where "looks like a header" means "bits 0-10 (size) are
> equal to the bitwise-NOT of bits 16-26", and (2) can happen by coincidence
> for 1/2048 32-bit values starting a continuation URB (easy to hit dealing
> with large volumes of video data as we were). It appears to expect the
> header for every URB whereas the rest of the code at least expects it only
> once per network packet (look at following code that only reads it for
> remaining == 0).
>
> So that change made no sense to me, but I don't have significant kernel dev
> experience. Effectively it will drop/truncate every (2047/2048) split
> (longer than an URB) packet, and report an error for the second URB and then
> again for treating said second URB as a first URB for a packet. I would
> expect your problems will go away just removing the second change. You could
> also change the != to == in "if (size != ...)" but then you'd still have
> 1/2048 (depending on data patterns) false positives.

I'll have to look into this more. I'm not super familiar with usb or
networking, so I'm not sure I can judge the better approach.

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread David B. Robins

On 2016-05-03 00:55, John Stultz wrote:

In testing with HiKey, we found that since commit 3f30b158eba5c60
(asix: On RX avoid creating bad Ethernet frames), we're seeing lots of
noise during network transfers:

[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x54ebb5ec, offset 4
[  239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xcdffe7a2, offset 4
[  239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x1d36f59d, offset 4
[  239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xaef3c1e9, offset 4
[  239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x2881912, offset 4
[  239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x5638f7e2, offset 4

And network throughput ends up being pretty bursty and slow with a
overall throughput of at best ~30kB/s.

Looking through the commits since the v4.1 kernel where we didn't see
this, I narrowed the regression down, and reverting the following two
commits seems to avoid the problem:

6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
if no RX netdev buffer
3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
bad Ethernet frames

With these reverted, we don't see all the error messages, and we see
better ~1.1MB/s throughput (I've got a mouse plugged in, so I think
the usb host is only running at "full-speed" mode here).


I don't think the first one is giving you problems (except as triggered 
by the second) but I had concerns about the second myself (and emailed 
the author off-list, but received no reply), and we did not take that 
commit for our own product.


Specifically, the second change, 3f30... (original patch: 
https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1) 
appears to do the exact opposite of what it claims, i.e., instead of 
"resync if this looks like a header", it does "resync if this does NOT 
look like a (packet) header", where "looks like a header" means "bits 
0-10 (size) are equal to the bitwise-NOT of bits 16-26", and (2) can 
happen by coincidence for 1/2048 32-bit values starting a continuation 
URB (easy to hit dealing with large volumes of video data as we were). 
It appears to expect the header for every URB whereas the rest of the 
code at least expects it only once per network packet (look at following 
code that only reads it for remaining == 0).


So that change made no sense to me, but I don't have significant kernel 
dev experience. Effectively it will drop/truncate every (2047/2048) 
split (longer than an URB) packet, and report an error for the second 
URB and then again for treating said second URB as a first URB for a 
packet. I would expect your problems will go away just removing the 
second change. You could also change the != to == in "if (size != ...)" 
but then you'd still have 1/2048 (depending on data patterns) false 
positives.



This worries me some, as the patches seem to describe trying to fix
the issue they seem to cause, so I suspect a revert isn't the correct
solution, but am not sure why we're having such trouble and the patch
authors did not.  I'd be happy to do further testing of patches if
folks have any ideas.

Originally Reported-by: Yongqin Liu 

thanks
-john


David



Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread David B. Robins

On 2016-05-03 00:55, John Stultz wrote:

In testing with HiKey, we found that since commit 3f30b158eba5c60
(asix: On RX avoid creating bad Ethernet frames), we're seeing lots of
noise during network transfers:

[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x54ebb5ec, offset 4
[  239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xcdffe7a2, offset 4
[  239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x1d36f59d, offset 4
[  239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xaef3c1e9, offset 4
[  239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x2881912, offset 4
[  239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x5638f7e2, offset 4

And network throughput ends up being pretty bursty and slow with a
overall throughput of at best ~30kB/s.

Looking through the commits since the v4.1 kernel where we didn't see
this, I narrowed the regression down, and reverting the following two
commits seems to avoid the problem:

6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
if no RX netdev buffer
3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
bad Ethernet frames

With these reverted, we don't see all the error messages, and we see
better ~1.1MB/s throughput (I've got a mouse plugged in, so I think
the usb host is only running at "full-speed" mode here).


I don't think the first one is giving you problems (except as triggered 
by the second) but I had concerns about the second myself (and emailed 
the author off-list, but received no reply), and we did not take that 
commit for our own product.


Specifically, the second change, 3f30... (original patch: 
https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1) 
appears to do the exact opposite of what it claims, i.e., instead of 
"resync if this looks like a header", it does "resync if this does NOT 
look like a (packet) header", where "looks like a header" means "bits 
0-10 (size) are equal to the bitwise-NOT of bits 16-26", and (2) can 
happen by coincidence for 1/2048 32-bit values starting a continuation 
URB (easy to hit dealing with large volumes of video data as we were). 
It appears to expect the header for every URB whereas the rest of the 
code at least expects it only once per network packet (look at following 
code that only reads it for remaining == 0).


So that change made no sense to me, but I don't have significant kernel 
dev experience. Effectively it will drop/truncate every (2047/2048) 
split (longer than an URB) packet, and report an error for the second 
URB and then again for treating said second URB as a first URB for a 
packet. I would expect your problems will go away just removing the 
second change. You could also change the != to == in "if (size != ...)" 
but then you'd still have 1/2048 (depending on data patterns) false 
positives.



This worries me some, as the patches seem to describe trying to fix
the issue they seem to cause, so I suspect a revert isn't the correct
solution, but am not sure why we're having such trouble and the patch
authors did not.  I'd be happy to do further testing of patches if
folks have any ideas.

Originally Reported-by: Yongqin Liu 

thanks
-john


David



Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread Dean Jenkins

On 03/05/16 11:04, Guodong Xu wrote:

On 3 May 2016 at 17:23, Dean Jenkins  wrote:

On 03/05/16 05:55, John Stultz wrote:

In testing with HiKey, we found that since commit 3f30b158eba5c60
(asix: On RX avoid creating bad Ethernet frames), we're seeing lots of
noise during network transfers:

[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x54ebb5ec, offset 4
[  239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xcdffe7a2, offset 4
[  239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x1d36f59d, offset 4
[  239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xaef3c1e9, offset 4
[  239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x2881912, offset 4
[  239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x5638f7e2, offset 4


And network throughput ends up being pretty bursty and slow with a
overall throughput of at best ~30kB/s.

Looking through the commits since the v4.1 kernel where we didn't see
this, I narrowed the regression down, and reverting the following two
commits seems to avoid the problem:

6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
if no RX netdev buffer
3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
bad Ethernet frames

With these reverted, we don't see all the error messages, and we see
better ~1.1MB/s throughput (I've got a mouse plugged in, so I think
the usb host is only running at "full-speed" mode here).

This worries me some, as the patches seem to describe trying to fix
the issue they seem to cause, so I suspect a revert isn't the correct
solution, but am not sure why we're having such trouble and the patch
authors did not.  I'd be happy to do further testing of patches if
folks have any ideas.

Originally Reported-by: Yongqin Liu 

thanks
-john

Hi John,

Some ASIX chipsets span the Ethernet frame over consecutive URBs which
requires successful transfer of 2 URBs.

This means states of a previous URB influences the processing of the next
URB including a dropped URB (causes a discontinuity in the data stream). In
other words synchronisation of the in-band 32-bit header word needs to be
tracked between URBs. Some ASIX chipsets allow the in-band 32-bit header
word to be no longer fixed to the start of the URB buffer so it moves to any
position within the URB buffer.

I understand your point of suggesting it is a "regression" for your device
but the driver was broken for DUB-E100 C1 (small black USB device). So you
cannot revert the commits as this would break DUB-E100 C1 (small black USB
device).


6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
if no RX netdev buffer

This commit is necessary because it avoids a crash when netdev buffer failed
to be allocated for the 1st URB and the 2nd URB containing a spanned
Ethernet frame is processed. The crash happens because the 2nd URB assumed
that the netdev buffer had been allocated.


3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
bad Ethernet frames

This commit is necessary to avoid sending bad Ethernet frames into the IP
stack during loss of synchronisation and to dropping good Ethernet frames.
This commit improves the synchronisation recovery mechanism of the in-band
32-bit header word.

The ASIX USB to Ethernet devices these commits were tested on where DUB-E100
C1 (small black USB device). Embedded ARM based systems were used where
memory resources can run out.

I don't have the chance to look into detail yet. But just a caution,
did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey
is an ARM 64-bit system. I suggest we should be careful on that. I saw
similar issues when transferring to a 64-bit system in other net
drivers.
We used 32-bit ARM and never tested on 64-bit ARM so I suggest that the 
commits need to be reviewed with 64-bit OS in mind.


Do you have any suggestion on this regard?
Try testing on a Linux PC x86 32-bit OS which has has a kernel 
containing my ASIX commits. This will help to confirm whether the 
failure is related to 32-bit or 64-bit OS. Then try with Linux PC x86 
64-bit OS, this should fail otherwise it points to something specific in 
your ARM 64-bit platform.





It could be that for your USB to Ethernet device that the wrong
configuration settings have been used. In other words the ASIX driver is
flexible to support various variants of the ASIX chipsets. For example, does
your device support Ethernet frames spanning multiple URBs (multiple USB
transfers) ?

Would you please suggest how to find out this information? How can I
change my 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread Dean Jenkins

On 03/05/16 11:04, Guodong Xu wrote:

On 3 May 2016 at 17:23, Dean Jenkins  wrote:

On 03/05/16 05:55, John Stultz wrote:

In testing with HiKey, we found that since commit 3f30b158eba5c60
(asix: On RX avoid creating bad Ethernet frames), we're seeing lots of
noise during network transfers:

[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x54ebb5ec, offset 4
[  239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xcdffe7a2, offset 4
[  239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x1d36f59d, offset 4
[  239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xaef3c1e9, offset 4
[  239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x2881912, offset 4
[  239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x5638f7e2, offset 4


And network throughput ends up being pretty bursty and slow with a
overall throughput of at best ~30kB/s.

Looking through the commits since the v4.1 kernel where we didn't see
this, I narrowed the regression down, and reverting the following two
commits seems to avoid the problem:

6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
if no RX netdev buffer
3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
bad Ethernet frames

With these reverted, we don't see all the error messages, and we see
better ~1.1MB/s throughput (I've got a mouse plugged in, so I think
the usb host is only running at "full-speed" mode here).

This worries me some, as the patches seem to describe trying to fix
the issue they seem to cause, so I suspect a revert isn't the correct
solution, but am not sure why we're having such trouble and the patch
authors did not.  I'd be happy to do further testing of patches if
folks have any ideas.

Originally Reported-by: Yongqin Liu 

thanks
-john

Hi John,

Some ASIX chipsets span the Ethernet frame over consecutive URBs which
requires successful transfer of 2 URBs.

This means states of a previous URB influences the processing of the next
URB including a dropped URB (causes a discontinuity in the data stream). In
other words synchronisation of the in-band 32-bit header word needs to be
tracked between URBs. Some ASIX chipsets allow the in-band 32-bit header
word to be no longer fixed to the start of the URB buffer so it moves to any
position within the URB buffer.

I understand your point of suggesting it is a "regression" for your device
but the driver was broken for DUB-E100 C1 (small black USB device). So you
cannot revert the commits as this would break DUB-E100 C1 (small black USB
device).


6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
if no RX netdev buffer

This commit is necessary because it avoids a crash when netdev buffer failed
to be allocated for the 1st URB and the 2nd URB containing a spanned
Ethernet frame is processed. The crash happens because the 2nd URB assumed
that the netdev buffer had been allocated.


3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
bad Ethernet frames

This commit is necessary to avoid sending bad Ethernet frames into the IP
stack during loss of synchronisation and to dropping good Ethernet frames.
This commit improves the synchronisation recovery mechanism of the in-band
32-bit header word.

The ASIX USB to Ethernet devices these commits were tested on where DUB-E100
C1 (small black USB device). Embedded ARM based systems were used where
memory resources can run out.

I don't have the chance to look into detail yet. But just a caution,
did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey
is an ARM 64-bit system. I suggest we should be careful on that. I saw
similar issues when transferring to a 64-bit system in other net
drivers.
We used 32-bit ARM and never tested on 64-bit ARM so I suggest that the 
commits need to be reviewed with 64-bit OS in mind.


Do you have any suggestion on this regard?
Try testing on a Linux PC x86 32-bit OS which has has a kernel 
containing my ASIX commits. This will help to confirm whether the 
failure is related to 32-bit or 64-bit OS. Then try with Linux PC x86 
64-bit OS, this should fail otherwise it points to something specific in 
your ARM 64-bit platform.





It could be that for your USB to Ethernet device that the wrong
configuration settings have been used. In other words the ASIX driver is
flexible to support various variants of the ASIX chipsets. For example, does
your device support Ethernet frames spanning multiple URBs (multiple USB
transfers) ?

Would you please suggest how to find out this information? How can I
change my device's configuration settings to support spanning 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread Guodong Xu
On 3 May 2016 at 17:23, Dean Jenkins  wrote:
> On 03/05/16 05:55, John Stultz wrote:
>>
>> In testing with HiKey, we found that since commit 3f30b158eba5c60
>> (asix: On RX avoid creating bad Ethernet frames), we're seeing lots of
>> noise during network transfers:
>>
>> [  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
>> synchronisation was lost, remaining 988
>> [  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
>> 0x54ebb5ec, offset 4
>> [  239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
>> 0xcdffe7a2, offset 4
>> [  239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
>> synchronisation was lost, remaining 988
>> [  239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
>> 0x1d36f59d, offset 4
>> [  239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
>> 0xaef3c1e9, offset 4
>> [  239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
>> synchronisation was lost, remaining 988
>> [  239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
>> 0x2881912, offset 4
>> [  239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
>> 0x5638f7e2, offset 4
>>
>>
>> And network throughput ends up being pretty bursty and slow with a
>> overall throughput of at best ~30kB/s.
>>
>> Looking through the commits since the v4.1 kernel where we didn't see
>> this, I narrowed the regression down, and reverting the following two
>> commits seems to avoid the problem:
>>
>> 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
>> if no RX netdev buffer
>> 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
>> bad Ethernet frames
>>
>> With these reverted, we don't see all the error messages, and we see
>> better ~1.1MB/s throughput (I've got a mouse plugged in, so I think
>> the usb host is only running at "full-speed" mode here).
>>
>> This worries me some, as the patches seem to describe trying to fix
>> the issue they seem to cause, so I suspect a revert isn't the correct
>> solution, but am not sure why we're having such trouble and the patch
>> authors did not.  I'd be happy to do further testing of patches if
>> folks have any ideas.
>>
>> Originally Reported-by: Yongqin Liu 
>>
>> thanks
>> -john
>
> Hi John,
>
> Some ASIX chipsets span the Ethernet frame over consecutive URBs which
> requires successful transfer of 2 URBs.
>
> This means states of a previous URB influences the processing of the next
> URB including a dropped URB (causes a discontinuity in the data stream). In
> other words synchronisation of the in-band 32-bit header word needs to be
> tracked between URBs. Some ASIX chipsets allow the in-band 32-bit header
> word to be no longer fixed to the start of the URB buffer so it moves to any
> position within the URB buffer.
>
> I understand your point of suggesting it is a "regression" for your device
> but the driver was broken for DUB-E100 C1 (small black USB device). So you
> cannot revert the commits as this would break DUB-E100 C1 (small black USB
> device).
>
>> 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
>> if no RX netdev buffer
>
> This commit is necessary because it avoids a crash when netdev buffer failed
> to be allocated for the 1st URB and the 2nd URB containing a spanned
> Ethernet frame is processed. The crash happens because the 2nd URB assumed
> that the netdev buffer had been allocated.
>
>> 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
>> bad Ethernet frames
>
> This commit is necessary to avoid sending bad Ethernet frames into the IP
> stack during loss of synchronisation and to dropping good Ethernet frames.
> This commit improves the synchronisation recovery mechanism of the in-band
> 32-bit header word.
>
> The ASIX USB to Ethernet devices these commits were tested on where DUB-E100
> C1 (small black USB device). Embedded ARM based systems were used where
> memory resources can run out.

I don't have the chance to look into detail yet. But just a caution,
did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey
is an ARM 64-bit system. I suggest we should be careful on that. I saw
similar issues when transferring to a 64-bit system in other net
drivers.

Do you have any suggestion on this regard?

>
> It could be that for your USB to Ethernet device that the wrong
> configuration settings have been used. In other words the ASIX driver is
> flexible to support various variants of the ASIX chipsets. For example, does
> your device support Ethernet frames spanning multiple URBs (multiple USB
> transfers) ?

Would you please suggest how to find out this information? How can I
change my device's configuration settings to support spanning multiple
URBs?

>
> So I doubt my commits are "broken" because we don't see your failures (not
> tested your device). It is more likely that your ASIX device needs to be
> properly identified and 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread Guodong Xu
On 3 May 2016 at 17:23, Dean Jenkins  wrote:
> On 03/05/16 05:55, John Stultz wrote:
>>
>> In testing with HiKey, we found that since commit 3f30b158eba5c60
>> (asix: On RX avoid creating bad Ethernet frames), we're seeing lots of
>> noise during network transfers:
>>
>> [  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
>> synchronisation was lost, remaining 988
>> [  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
>> 0x54ebb5ec, offset 4
>> [  239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
>> 0xcdffe7a2, offset 4
>> [  239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
>> synchronisation was lost, remaining 988
>> [  239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
>> 0x1d36f59d, offset 4
>> [  239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
>> 0xaef3c1e9, offset 4
>> [  239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
>> synchronisation was lost, remaining 988
>> [  239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
>> 0x2881912, offset 4
>> [  239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
>> 0x5638f7e2, offset 4
>>
>>
>> And network throughput ends up being pretty bursty and slow with a
>> overall throughput of at best ~30kB/s.
>>
>> Looking through the commits since the v4.1 kernel where we didn't see
>> this, I narrowed the regression down, and reverting the following two
>> commits seems to avoid the problem:
>>
>> 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
>> if no RX netdev buffer
>> 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
>> bad Ethernet frames
>>
>> With these reverted, we don't see all the error messages, and we see
>> better ~1.1MB/s throughput (I've got a mouse plugged in, so I think
>> the usb host is only running at "full-speed" mode here).
>>
>> This worries me some, as the patches seem to describe trying to fix
>> the issue they seem to cause, so I suspect a revert isn't the correct
>> solution, but am not sure why we're having such trouble and the patch
>> authors did not.  I'd be happy to do further testing of patches if
>> folks have any ideas.
>>
>> Originally Reported-by: Yongqin Liu 
>>
>> thanks
>> -john
>
> Hi John,
>
> Some ASIX chipsets span the Ethernet frame over consecutive URBs which
> requires successful transfer of 2 URBs.
>
> This means states of a previous URB influences the processing of the next
> URB including a dropped URB (causes a discontinuity in the data stream). In
> other words synchronisation of the in-band 32-bit header word needs to be
> tracked between URBs. Some ASIX chipsets allow the in-band 32-bit header
> word to be no longer fixed to the start of the URB buffer so it moves to any
> position within the URB buffer.
>
> I understand your point of suggesting it is a "regression" for your device
> but the driver was broken for DUB-E100 C1 (small black USB device). So you
> cannot revert the commits as this would break DUB-E100 C1 (small black USB
> device).
>
>> 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
>> if no RX netdev buffer
>
> This commit is necessary because it avoids a crash when netdev buffer failed
> to be allocated for the 1st URB and the 2nd URB containing a spanned
> Ethernet frame is processed. The crash happens because the 2nd URB assumed
> that the netdev buffer had been allocated.
>
>> 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
>> bad Ethernet frames
>
> This commit is necessary to avoid sending bad Ethernet frames into the IP
> stack during loss of synchronisation and to dropping good Ethernet frames.
> This commit improves the synchronisation recovery mechanism of the in-band
> 32-bit header word.
>
> The ASIX USB to Ethernet devices these commits were tested on where DUB-E100
> C1 (small black USB device). Embedded ARM based systems were used where
> memory resources can run out.

I don't have the chance to look into detail yet. But just a caution,
did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey
is an ARM 64-bit system. I suggest we should be careful on that. I saw
similar issues when transferring to a 64-bit system in other net
drivers.

Do you have any suggestion on this regard?

>
> It could be that for your USB to Ethernet device that the wrong
> configuration settings have been used. In other words the ASIX driver is
> flexible to support various variants of the ASIX chipsets. For example, does
> your device support Ethernet frames spanning multiple URBs (multiple USB
> transfers) ?

Would you please suggest how to find out this information? How can I
change my device's configuration settings to support spanning multiple
URBs?

>
> So I doubt my commits are "broken" because we don't see your failures (not
> tested your device). It is more likely that your ASIX device needs to be
> properly identified and configured to be compatible with the ASIX driver. At

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread Dean Jenkins

On 03/05/16 05:55, John Stultz wrote:

In testing with HiKey, we found that since commit 3f30b158eba5c60
(asix: On RX avoid creating bad Ethernet frames), we're seeing lots of
noise during network transfers:

[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x54ebb5ec, offset 4
[  239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xcdffe7a2, offset 4
[  239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x1d36f59d, offset 4
[  239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xaef3c1e9, offset 4
[  239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x2881912, offset 4
[  239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x5638f7e2, offset 4


And network throughput ends up being pretty bursty and slow with a
overall throughput of at best ~30kB/s.

Looking through the commits since the v4.1 kernel where we didn't see
this, I narrowed the regression down, and reverting the following two
commits seems to avoid the problem:

6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
if no RX netdev buffer
3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
bad Ethernet frames

With these reverted, we don't see all the error messages, and we see
better ~1.1MB/s throughput (I've got a mouse plugged in, so I think
the usb host is only running at "full-speed" mode here).

This worries me some, as the patches seem to describe trying to fix
the issue they seem to cause, so I suspect a revert isn't the correct
solution, but am not sure why we're having such trouble and the patch
authors did not.  I'd be happy to do further testing of patches if
folks have any ideas.

Originally Reported-by: Yongqin Liu 

thanks
-john

Hi John,

Some ASIX chipsets span the Ethernet frame over consecutive URBs which 
requires successful transfer of 2 URBs.


This means states of a previous URB influences the processing of the 
next URB including a dropped URB (causes a discontinuity in the data 
stream). In other words synchronisation of the in-band 32-bit header 
word needs to be tracked between URBs. Some ASIX chipsets allow the 
in-band 32-bit header word to be no longer fixed to the start of the URB 
buffer so it moves to any position within the URB buffer.


I understand your point of suggesting it is a "regression" for your 
device but the driver was broken for DUB-E100 C1 (small black USB 
device). So you cannot revert the commits as this would break DUB-E100 
C1 (small black USB device).



6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
if no RX netdev buffer
This commit is necessary because it avoids a crash when netdev buffer 
failed to be allocated for the 1st URB and the 2nd URB containing a 
spanned Ethernet frame is processed. The crash happens because the 2nd 
URB assumed that the netdev buffer had been allocated.



3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
bad Ethernet frames
This commit is necessary to avoid sending bad Ethernet frames into the 
IP stack during loss of synchronisation and to dropping good Ethernet 
frames. This commit improves the synchronisation recovery mechanism of 
the in-band 32-bit header word.


The ASIX USB to Ethernet devices these commits were tested on where 
DUB-E100 C1 (small black USB device). Embedded ARM based systems were 
used where memory resources can run out.


It could be that for your USB to Ethernet device that the wrong 
configuration settings have been used. In other words the ASIX driver is 
flexible to support various variants of the ASIX chipsets. For example, 
does your device support Ethernet frames spanning multiple URBs 
(multiple USB transfers) ?


So I doubt my commits are "broken" because we don't see your failures 
(not tested your device). It is more likely that your ASIX device needs 
to be properly identified and configured to be compatible with the ASIX 
driver. At least, I suggest that is the best place to start your 
investigation.


Of course, your ASIX chipset might have a different behaviour for how 
the in-band 32-bit header word operates so perhaps special treatment is 
needed for your chipset ?


Please send to the mailing list the output of lsusb for your device so 
that people can know the USB product ID and vendor ID for your device. 
This is allows people to assist with the investigation. Do you have any 
links to websites that sell your device ?


Are you using UDP or TCP connections ?

Regards,
Dean

--
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.



Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread Dean Jenkins

On 03/05/16 05:55, John Stultz wrote:

In testing with HiKey, we found that since commit 3f30b158eba5c60
(asix: On RX avoid creating bad Ethernet frames), we're seeing lots of
noise during network transfers:

[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x54ebb5ec, offset 4
[  239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xcdffe7a2, offset 4
[  239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x1d36f59d, offset 4
[  239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xaef3c1e9, offset 4
[  239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x2881912, offset 4
[  239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x5638f7e2, offset 4


And network throughput ends up being pretty bursty and slow with a
overall throughput of at best ~30kB/s.

Looking through the commits since the v4.1 kernel where we didn't see
this, I narrowed the regression down, and reverting the following two
commits seems to avoid the problem:

6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
if no RX netdev buffer
3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
bad Ethernet frames

With these reverted, we don't see all the error messages, and we see
better ~1.1MB/s throughput (I've got a mouse plugged in, so I think
the usb host is only running at "full-speed" mode here).

This worries me some, as the patches seem to describe trying to fix
the issue they seem to cause, so I suspect a revert isn't the correct
solution, but am not sure why we're having such trouble and the patch
authors did not.  I'd be happy to do further testing of patches if
folks have any ideas.

Originally Reported-by: Yongqin Liu 

thanks
-john

Hi John,

Some ASIX chipsets span the Ethernet frame over consecutive URBs which 
requires successful transfer of 2 URBs.


This means states of a previous URB influences the processing of the 
next URB including a dropped URB (causes a discontinuity in the data 
stream). In other words synchronisation of the in-band 32-bit header 
word needs to be tracked between URBs. Some ASIX chipsets allow the 
in-band 32-bit header word to be no longer fixed to the start of the URB 
buffer so it moves to any position within the URB buffer.


I understand your point of suggesting it is a "regression" for your 
device but the driver was broken for DUB-E100 C1 (small black USB 
device). So you cannot revert the commits as this would break DUB-E100 
C1 (small black USB device).



6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
if no RX netdev buffer
This commit is necessary because it avoids a crash when netdev buffer 
failed to be allocated for the 1st URB and the 2nd URB containing a 
spanned Ethernet frame is processed. The crash happens because the 2nd 
URB assumed that the netdev buffer had been allocated.



3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
bad Ethernet frames
This commit is necessary to avoid sending bad Ethernet frames into the 
IP stack during loss of synchronisation and to dropping good Ethernet 
frames. This commit improves the synchronisation recovery mechanism of 
the in-band 32-bit header word.


The ASIX USB to Ethernet devices these commits were tested on where 
DUB-E100 C1 (small black USB device). Embedded ARM based systems were 
used where memory resources can run out.


It could be that for your USB to Ethernet device that the wrong 
configuration settings have been used. In other words the ASIX driver is 
flexible to support various variants of the ASIX chipsets. For example, 
does your device support Ethernet frames spanning multiple URBs 
(multiple USB transfers) ?


So I doubt my commits are "broken" because we don't see your failures 
(not tested your device). It is more likely that your ASIX device needs 
to be properly identified and configured to be compatible with the ASIX 
driver. At least, I suggest that is the best place to start your 
investigation.


Of course, your ASIX chipset might have a different behaviour for how 
the in-band 32-bit header word operates so perhaps special treatment is 
needed for your chipset ?


Please send to the mailing list the output of lsusb for your device so 
that people can know the USB product ID and vendor ID for your device. 
This is allows people to assist with the investigation. Do you have any 
links to websites that sell your device ?


Are you using UDP or TCP connections ?

Regards,
Dean

--
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.



[REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-02 Thread John Stultz
In testing with HiKey, we found that since commit 3f30b158eba5c60
(asix: On RX avoid creating bad Ethernet frames), we're seeing lots of
noise during network transfers:

[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x54ebb5ec, offset 4
[  239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xcdffe7a2, offset 4
[  239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x1d36f59d, offset 4
[  239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xaef3c1e9, offset 4
[  239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x2881912, offset 4
[  239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x5638f7e2, offset 4


And network throughput ends up being pretty bursty and slow with a
overall throughput of at best ~30kB/s.

Looking through the commits since the v4.1 kernel where we didn't see
this, I narrowed the regression down, and reverting the following two
commits seems to avoid the problem:

6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
if no RX netdev buffer
3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
bad Ethernet frames

With these reverted, we don't see all the error messages, and we see
better ~1.1MB/s throughput (I've got a mouse plugged in, so I think
the usb host is only running at "full-speed" mode here).

This worries me some, as the patches seem to describe trying to fix
the issue they seem to cause, so I suspect a revert isn't the correct
solution, but am not sure why we're having such trouble and the patch
authors did not.  I'd be happy to do further testing of patches if
folks have any ideas.

Originally Reported-by: Yongqin Liu 

thanks
-john


[REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-02 Thread John Stultz
In testing with HiKey, we found that since commit 3f30b158eba5c60
(asix: On RX avoid creating bad Ethernet frames), we're seeing lots of
noise during network transfers:

[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x54ebb5ec, offset 4
[  239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xcdffe7a2, offset 4
[  239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x1d36f59d, offset 4
[  239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xaef3c1e9, offset 4
[  239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x2881912, offset 4
[  239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x5638f7e2, offset 4


And network throughput ends up being pretty bursty and slow with a
overall throughput of at best ~30kB/s.

Looking through the commits since the v4.1 kernel where we didn't see
this, I narrowed the regression down, and reverting the following two
commits seems to avoid the problem:

6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
if no RX netdev buffer
3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
bad Ethernet frames

With these reverted, we don't see all the error messages, and we see
better ~1.1MB/s throughput (I've got a mouse plugged in, so I think
the usb host is only running at "full-speed" mode here).

This worries me some, as the patches seem to describe trying to fix
the issue they seem to cause, so I suspect a revert isn't the correct
solution, but am not sure why we're having such trouble and the patch
authors did not.  I'd be happy to do further testing of patches if
folks have any ideas.

Originally Reported-by: Yongqin Liu 

thanks
-john