On 31 March 2016 at 18:44, Ben Greear <[email protected]> wrote: > On 03/30/2016 11:51 PM, Michal Kazior wrote: >> >> On 29 March 2016 at 17:48, Ben Greear <[email protected]> wrote: >>> >>> On 03/29/2016 01:05 AM, Michal Kazior wrote: >>>> >>>> >>>> On 28 March 2016 at 21:01, Ben Greear <[email protected]> wrote: >>>>> >>>>> >>>>> I'm seeing the ring-full messages below when running 35 stations on >>>>> modified 10.4.3 firmware. I also have serial console logging enabled, >>>>> so >>>>> things are running a bit slow...this seems to exacerbate the issue. >>>>> >>>>> [ 91.108923] ath10k_pci 0000:05:00.0: htc failed hif-tx-sq: -105 eid: >>>>> 2 >>>>> credits: 1 ep->tx_credits: 1 credit-flow-enabled: 1 >>>>> [ 91.108932] ath10k_pci 0000:05:00.0: could not request stats (type >>>>> 128 >>>>> ret -105) >>>>> [ 91.108942] ath10k_pci 0000:05:00.0: hif-tx-sg, full, nentries_mask: >>>>> 0x1f >>>>> write_idx: 2 sw-idx: 3 n_items: 1 pipe-id: 3 >>>>> [ 91.108944] ath10k_pci 0000:05:00.0: htc failed hif-tx-sq: -105 eid: >>>>> 2 >>>>> credits: 1 ep->tx_credits: 1 credit-flow-enabled: 1 >>>>> [ 91.108952] ath10k_pci 0000:05:00.0: could not request stats (type 1 >>>>> ret >>>>> -105) >>>>> [ 91.108953] ath10k_pci 0000:05:00.0: failed to get fw stats for >>>>> ethtool: >>>>> -105 >>>>> [ 91.109039] ath10k_pci 0000:05:00.0: hif-tx-sg, full, nentries_mask: >>>>> 0x1f >>>>> write_idx: 2 sw-idx: 3 n_items: 1 pipe-id: 3 >>>>> [ 91.109041] ath10k_pci 0000:05:00.0: htc failed hif-tx-sq: -105 eid: >>>>> 2 >>>>> credits: 1 ep->tx_credits: 1 credit-flow-enabled: 1 >>>>> [ 91.109050] ath10k_pci 0000:05:00.0: could not request stats (type >>>>> 128 >>>>> ret -105) >>>>> [ 91.109060] ath10k_pci 0000:05:00.0: hif-tx-sg, full, nentries_mask: >>>>> 0x1f >>>>> write_idx: 2 sw-idx: 3 n_items: 1 pipe-id: 3 >>>>> [ 91.109062] ath10k_pci 0000:05:00.0: htc failed hif-tx-sq: -105 eid: >>>>> 2 >>>>> credits: 1 ep->tx_credits: 1 credit-flow-enabled: 1 >>>>> [ 91.109070] ath10k_pci 0000:05:00.0: could not request stats (type 1 >>>>> ret >>>>> -105) >>>>> [ 91.109072] ath10k_pci 0000:05:00.0: failed to get fw stats for >>>>> ethtool: >>>>> -105 >>>>> [ 91.109157] ath10k_pci 0000:05:00.0: hif-tx-sg, full, nentries_mask: >>>>> 0x1f >>>>> write_idx: 2 sw-idx: 3 n_items: 1 pipe-id: 3 >>>>> [ 91.109160] ath10k_pci 0000:05:00.0: htc failed hif-tx-sq: -105 eid: >>>>> 2 >>>>> credits: 1 ep->tx_credits: 1 credit-flow-enabled: 1 >>>>> >>>>> >>>>> I am struggling to understand how the pipe can be full since we have >>>>> tx-credits logic >>>>> enabled for the WMI pipe. >>>>> >>>>> Any suggestions on what sort of bugs could cause this? >>>>> >>>>> And, should the ath10k_wmi_cmd_send retry when we get a -105 return >>>>> code in hopes it will free up shortly instead of just failing and >>>>> leaving >>>>> the system in invalid state? >>>> >>>> >>>> >>>> It probably shouldn't. As you've pointed out HTC tx credits should >>>> prevent this in the first place. If you see -105 it means something is >>>> really broken and needs to be fixed properly. >>>> >>>> A thing that comes to mind is that CE -for whatever reason- would need >>>> to stop completing CE ring items. Are you running with MSI? 1 or >>>> multiple interrupts? Did you try forcing legacy interrupt mode to rule >>>> out MSI problems? >>>> >>>> You could add a debug messages to see if the HTC-WMI CE ring gets tx >>>> completions properly. >>> >>> >>> >>> I don't think I'm using MSI. Could it be that whatever logic that should >>> be processing the tx-completions is just running slower than whatever is >>> handling the WMI messages (and credits)? >> >> >> Your WMI command queue is limited to HTC Tx credits (2, right?). This >> means you can enqueue, in practice, 2 CE items to WMI's CE Tx pipe. >> Once you've done that you have to wait until next interrupt carrying >> HTC Rx message with Tx Credit Update. If you get this it implies FW >> received your WMI commands which implies WMI's CE Tx pipe was updated >> (and at least the 2 CE's associated with your WMI commands have been >> consumed/completed). Even if you assume CE processing ordering is >> reversed (i.e. HTC Rx gets processed before HTC Tx completions are) >> you still should be able to have enqueued no more than 4 CE items at a >> time as far as WMI is concerned. >> >> Now, if you assume MSI-range (multiple MSI interrupts; a vector) is >> enabled, you can service each CE pipe in a separate interrupt and >> tasklet. This could, in theory, result in some weird race as HTC Tx >> credits and CE Tx pipe completions are not guaranteed to be >> serialized. >> >> Or maybe you're using some forced WMI commands in your fork and >> disregard Tx credits in some cases? This could explain the problem >> even when running with a single interrupt. > > > So, I am using MSI-X, I guess? > > # dmesg|grep -i msi > [65284.853372] ath10k_pci 0000:05:00.0: pci irq msi-x interrupts 13 irq_mode > 0 reset_mode 0
Yep. This at least makes it possible for this weird problem to come into existance. However I still find it a little hard to believe for tasklets to be scheduled this badly. Maybe the device doesn't assert interrupts properly as Adrian suggested? Or maybe they are not mapped properly? I think you're actually the first one to exercise MSI-range support on qca99x0. MichaĆ _______________________________________________ ath10k mailing list [email protected] http://lists.infradead.org/mailman/listinfo/ath10k
