Rajkumar Manoharan <rmano...@codeaurora.org> writes:

> On 2018-11-07 06:53, Toke Høiland-Jørgensen wrote:
>> Rajkumar Manoharan <rmano...@codeaurora.org> writes:
>> 
>>> Meanwhile we did some more experiments with both modes. The experiment
>>> was done in open environment and fixed rate and UDP traffic ran for 60
>>> seconds.
>>> 
>>> Seems like push mode not honoring the configured weight. Always the
>>> airtime was almost same whereas in pull-mode airtime is changing based
>>> on configured weight. Hence I would like to know your results.
>> 
>> Right, so I verified that the current version of the patch set still
>> works with ath9k. However, the ath10k card I have doesn't seem to
>> support peer stats, so I can't test ath10k.
>> 
>> $ lspci | grep Qualcomm
>> 03:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac
>> Wireless Network Adapter
>> 
>> $ cat /sys/kernel/debug/ieee80211/phy1/ath10k/chip_id
>> 0x043202ff
>> 
>> $ cat /sys/kernel/debug/ieee80211/phy1/ath10k/wmi_services  | grep PEER
>> WMI_SERVICE_PEER_CACHING                 -
>> WMI_SERVICE_PEER_STATS                   -
>> 
>
> Oops... Yeah 988x firmware (10.2.4) does not have peer stats support.
>
>> 
>> Is there a way to force-enable airtime support, is this a hardware 
>> issue?
>> 
> Unfortunately not. There is one more pending change that handles
> airtime report from HTT tx-compl. Again it depends firmware support.
> These experiments are taken with this f/w interface. Will post the
> change.

Right, thought so. Too bad. Guess you are doing all the ath10k testing,
then ;)

>>>             sta1            sta2            sta3            sta4
>>> pull-mode   8s(205us)       18s(3.2ms)      8s(205us)       14s(410us)
>>>             12s(256us)      12s(256us)      13s(256us)      12s(256us)
>>>             14s(4ms)        13s(4ms)        14s(4ms)        13s(4ms)
>>> 
>>> push-mode   15s(205us)      12s(3.2ms)      16s(205us)      12s(410us)
>>>             15s(256us)      12s(256us)      16s(256us)      12s(256us)
>>>             14s(4ms)        13s(4ms)        16s(4ms)        12s(4ms)
>> 
>> Right, so the pull-mode results are encouraging! *Something* is
>> happening, at least, even though the aggregate airtime doesn't quite
>> match the ratios of the configured weights.
>> 
>> Are you running the UDP generator on the AP itself, or on a separate
>> device, BTW? If it's on the AP, the userspace socket can get throttled,
>> which will interfere with results, so it's better to have it on a
>> separate device (connected via ethernet).
>> 
> Traffic b/w wired client (via ethernet) and wireless clients.

Cool.

>> As for push-mode, could this be because of bad buffer management? I.e.,
>> because there's a lag between the time airtime is registered, and the
>> time that airtime usage is reported, the driver just pushes a whole
>> bunch of packets to the firmware when it gets the chance, which 
>> prevents
>> the scheduler from throttling properly. This could also explain the
>> better, but not quite perfect, results in pull mode, assuming that pull
>> mode results in better firmware buffer management which reduces, but
>> doesn't quite remove, the lag.
>> 
> Hmm... I agree that lag in reporting airtime may cause more data push
> to hw. Right now both tx and tx-compl are serialized by
> active_txq_lock. So once lock acquired by tx path, it will download
> all frames. i.e it is even true for ath9k driver. Hence I am wondering
> how it is working only with ath9k.

If enough packets are dequeued at once that the TXQ empties and is not
put back on the scheduling loop, the next_txq() loop is just going to
loop through the remaining TXQs and immediately increase their quantum.
In ath9k, there's a maximum of two outstanding aggregates below the TXQ
level, but I think you mentioned that ath10k can queue thousands in
firmware?

Your patch removes the 'max 16 packets at a time' check before the call
to ath10k_mac_tx_push_txq(); try adding that back and see if it helps?

>> If this is indeed the reason, the queue limit patches should hopefully
>> be a solution. So guess we need to get those working as well :)
>> 
> I would prefer to baseline the basic infra into upstream first and do
> enhancement on top of that.

Sure, I'm fine with merging this first and building on top.

> I request you to revisit maintaining per driver default. Otherwise
> there would be performance impact with 256us. :(

Hmm, how about we make it a driver-specified multiplier instead (which
could be 4, 8 or 16 for ath10k)? That way it would still work even if
userspace changes the weights. It would affect the minimum possible
granularity, but that is probably acceptable; and we would be free to
change the mechanism later, without affecting the userspace API.

-Toke

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Reply via email to