On 12/6/2024 8:27 PM, James Prestwood wrote:
> Hi Baochen,
>
> On 12/5/24 6:47 PM, Baochen Qiang wrote:
>>
>> On 9/5/2024 9:46 AM, Baochen Qiang wrote:
>>>
>>> On 9/5/2024 2:03 AM, Jeff Johnson wrote:
>>>> On 8/16/2024 5:04 AM, James Prestwood wrote:
>>>>> Hi Baochen,
>>>>>
>>>>> On 8/16/24 3:19 AM, Baochen Qiang wrote:
>>>>>> On 7/12/2024 9:11 PM, James Prestwood wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've seen this error mentioned on random forum posts, but its always
>>>>>>> associated
>>>>>>> with a kernel crash/warning or some very obvious negative behavior.
>>>>>>> I've noticed
>>>>>>> this occasionally and at one location very frequently during FT roaming,
>>>>>>> specifically just after CMD_ASSOCIATE is issued. For our company run
>>>>>>> networks I'm
>>>>>>> not seeing any negative behavior apart from a 3 second delay in sending
>>>>>>> the re-
>>>>>>> association frame since the kernel waits for this timeout. But we have
>>>>>>> some
>>>>>>> networks our clients run on that we do not own (different vendor), and
>>>>>>> we are
>>>>>>> seeing association timeouts after this error occurs and in some cases
>>>>>>> the AP is
>>>>>>> sending a deauthentication with reason code 8 instead of replying with a
>>>>>>> reassociation reply and an error status, which is quite odd.
>>>>>>>
>>>>>>> We are chasing down this with the vendor of these APs as well, but the
>>>>>>> behavior
>>>>>>> always happens after we see this key removal failure/timeout on the
>>>>>>> client side. So
>>>>>>> it would appear there is potentially a problem on both the client and
>>>>>>> AP. My guess
>>>>>>> is _something_ about the re-association frame changes when this error is
>>>>>>> encountered, but I cannot see how that would be the case. We are
>>>>>>> working to get
>>>>>>> PCAPs now, but its through a 3rd party, so that timing is out of my
>>>>>>> control.
>>>>>>>
>>>>>>> From the kernel code this error would appear innocuous, the old key
>>>>>>> is failing to
>>>>>>> be removed but it gets immediately replaced by the new key. And we
>>>>>>> don't see that
>>>>>>> addition failing. Am I understanding that logic correctly? I.e. this
>>>>>>> logic:
>>>>>>>
>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/
>>>>>>> mac80211/key.c#n503
>>>>>>>
>>>>>>> Below are a few kernel logs of the issue happening, some with the
>>>>>>> deauth being sent
>>>>>>> by the AP, some with just timeouts:
>>>>>>>
>>>>>>> --- No deauth frame sent, just association timeouts after the error ---
>>>>>>>
>>>>>>> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for
>>>>>>> new assoc to
>>>>>>> <new BSS>
>>>>>>> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install key
>>>>>>> for vdev 0
>>>>>>> peer <previous BSS>: -110
>>>>>>> Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous BSS>)
>>>>>>> from
>>>>>>> hardware (-110)
>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
>>>>>>> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
>>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
>>>>>>> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
>>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticated
>>>>>>> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS>
>>>>>>> (capab=0x1111 status=0
>>>>>>> aid=16)
>>>>>>> Jul 11 00:05:36 kernel: wlan0: associated
>>>>>>>
>>>>>>> --- Deauth frame sent amidst the association timeouts ---
>>>>>>>
>>>>>>> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for
>>>>>>> new assoc to
>>>>>>> <new BSS>
>>>>>>> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install key
>>>>>>> for vdev 0
>>>>>>> peer <previous BSS>: -110
>>>>>>> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous BSS>)
>>>>>>> from
>>>>>>> hardware (-110)
>>>>>>> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while
>>>>>>> associating
>>>>>>> (Reason: 8=DISASSOC_STA_HAS_LEFT)
>>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
>>>>>>> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
>>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticated
>>>>>>> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS>
>>>>>>> (capab=0x1111 status=0
>>>>>>> aid=101)
>>>>>>> Jul 11 00:43:24 kernel: wlan0: associated
>>>>>>>
>>>>>> Hi James, this is QCA6174, right? could you also share firmware version?
>>>>> Yep, using:
>>>>>
>>>>> qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1dac:0261
>>>>> firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp
>>>>> crc32 bf907c7c
>>>>>
>>>>> I did try in one instance the latest firmware, 309, and still saw the
>>>>> same behavior but 288 is what all our devices are running.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> James
>>>> Baochen, are you looking more into this? Would prefer to fix the root cause
>>>> rather than take "[RFC 0/1] wifi: ath10k: improvement on key removal
>>>> failure"
>>> I asked CST team to try to reproduce this issue such that we can get
>>> firmware dump for
>>> debug further. What I got is that CST team is currently busy at other
>>> critical
>>> schedules and they are planning to debug this ath10k issue after those
>>> schedules get
>>> finished.
>>>
>> Jeff, I am notified that CST team can not reproduce this issue.
>
> Thanks for reaching out to them at least. Maybe the firmware team can provide
> some info
> about how long it _should_ take to remove a key and we can make the timeout
> reflect that?
are you implying that the failure is due to a not-long-enough wait in host
driver? or you
want to know the maximum time firmware needs in removing key, and if it is less
than 3s we
can reduce current timeout to WAR the issue you hit?
>
> Thanks,
>
> James
>
>