On Mon, Apr 04, 2022 at 09:58:09PM -0400, Ashton Fagg wrote:
> >Synopsis:    iwx(4) device timeouts on 802.11ac networks
> >Category:    Bug/driver issue
> >Environment:
>       System      : OpenBSD 7.1
>       Details     : OpenBSD 7.1 (GENERIC.MP) #458: Sun Apr  3 23:10:53 MDT 
> 2022
>                        
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>       Architecture: OpenBSD.amd64
>       Machine     : amd64
> >Description:
>       I have noticed this only in this current snapshot, and the one I
>       was running previously (which was from Friday or Saturday). I
>       have been running other snapshots since 802.11ac support got
>       added with no issues. (Thank you to stsp, it's been working
>       great hitherto).
> 
>       My thinkpad t14s has intel wifi hardware that attaches to
>       iwx(4). I have experienced "stalls", where the wifi hardware
>       appears to just stop passing packets. Eventually connectivity
>       returns (sometimes 30 secs or so later).
> 
>       I flipped the debug bit for iwx(4) and managed to determine that
>       this is actually due to device timeout and reset. Prior to full
>       dmesg below, I have included the relevant output.
> 
>       There have been no changes to my system (aside from new
>       snapshots). Additionally, no changes to my network infra.
> 
> >How-To-Repeat:
>       As best I can tell, connecting to an 802.11ac access point and
>       keeping the wifi busy will surely do it. I've experienced this
>       frequently with having an ssh session open (with a build
>       running, so tonnes of output scrolling past), and having youtube
>       playing music going at the same time.
> >Fix:
>       Waiting a couple of seconds while the card resets itself is
>       enough to get things moving again.
> 
> 
> Please let me know if there's anything else I can provide to assist or
> if you'd like my help testing a potential fix. Thank you.

There is nothing actionable in your report, though it is good to know
that there seems to be an issue which the driver could handle better.

It is unclear to me why the device would suddenly stop generating
interrupts, which is what leads to a "device timeout". Generally, this
implies a problem that triggers at firmware, hardware, or RF level.

It would be good to know if your AP did anything extraordinary at the time.
Hopefully that would provide more context and lead to clues.

Did your AP switch channels, perhaps?

Or did the AP switch its channel width?
You could record beacons with tcpdump and look for differences in vhtop
information where the channel width is encoded:
  tcpdump -n -i iwx0 -y IEEE802_11_RADIO -s 4096 -v -D in type mgt subtype 
beacon
Channel width info should show up like this:
  vhtop=<80MHz chan,center chan 122,

Those are just shots in the dark though, the driver is already expected
to cope when such changes occur.

When the AP simply disappears from the air as a result of switching channels,
a small stall followed by a reconnect as you describe is expected. We do not
yet honor channel switch announcements (they are not authenticated and
therefore could be abused; something to revisit once support for protected
management frames gets implemented). The exact error condition shown in debug
output will depend on what the driver was doing at the moment. If there are
frames on Tx queues, I believe a device timeout is possible, though not pretty.

Maybe the reason is something else entirely and this issue will not have
anything related happening on the AP side.

I have been using iwx on my desktop, always on, on 11ac, without issues,
for weeks. At least for me, it has been quite stable. Certainly no less stable
than this driver ever has been. It runs for weeks without firmware errors
happening, which was not the case a year or two ago.

Reply via email to