Hello Jakub, On Thu, Jul 10, 2025 at 06:45:35PM -0700, Jakub Kicinski wrote: > > +MAX_WRITES: int = 40 > > FWIW the test takes 25sec on our debug-heavy VMs right now. > I think we can crank the writes quite a bit.. ?
Sure. I will increase it to 40. On my VMs I get more than 30 hits every single time: 2025-07-11 08:30:08,904 - DEBUG - BPFtrace output: {'hits': 30} 2025-07-11 08:30:08,904 - DEBUG - MAPS coming from bpftrace = {'hits': 30} > > +def ethtool_read_rx_tx_queue(interface_name: str) -> tuple[int, int]: > > + """ > > + Read the number of RX and TX queues using ethtool. This will be used > > + to restore it after the test > > + """ > > + rx_queue = 0 > > + tx_queue = 0 > > + > > + try: > > + ethtool_result = ethtool(f"-g {interface_name}").stdout > > json=True please and you'll get a dict, on CLI you can try: > > ethtool --json -g eth0 Sure. I was parsing manually because some options do not have --json format. ethtool --json -l eth0 ethtool: bad command line argument(s) I haven't checked upstream, but, if this feature is upstream, is it worth implementing it? > > + ethtool(f"-G {interface_name} rx {rx_val} tx {tx_val}") > > This is setting _ring size_ not queue count. > I suppose we want both, this and queue count to 1 (with ethtool -l / -L) > The ring size of 1 is unlikely to work on real devices. > I'd try setting it to 128 and 256 and if neither sticks just carry on > with whatever was there. Thanks. I creating a function to do it. Something as: def configure_network(ifname: str) -> None: """Configure ring size and queue numbers""" # Set defined queues to 1 to force congestion prev_queues = ethtool_get_queues_cnt(ifname) logging.debug("RX/TX/combined queues: %s", prev_queues) # Only set the queues to 1 if they exists in the device. I.e, they are > 0 ethtool_set_queues_cnt(ifname, tuple(1 if x > 0 else x for x in prev_queues)) defer(ethtool_set_queues_cnt, ifname, prev_queues) # Try to set the ring size to some low value. # Do not fail if the hardware do not accepted desired values prev_ring_size = ethtool_get_ringsize(ifname) for size in [(1, 1), (128, 128), (256, 256)]: if ethtool_set_ringsize(ifname, size): # hardware accepted the desired ringsize logging.debug("Set RX/TX ringsize to: %s from %s", size, prev_ring_size) break defer(ethtool_set_ringsize, ifname, prev_ring_size) > > + "remote_ip": ( > > + cfg.remote_addr_v["4"] if cfg.addr_ipver == "4" else > > cfg.remote_addr_v["6"] > > + ), > > this is already done for you > cfg.addr is either v4 or v6 depending on what was provided in the env Ack! > > +# toggle the interface up and down, to cause some congestion > > Let's not do this, you're missing disruptive annotation and for many > drivers NAPI is stopped before queues > https://github.com/linux-netdev/nipa/wiki/Guidance-for-test-authors#ksft_disruptive Sure. This is not needed to reproduce the issue. I just put it in there in order to create more entropy. Anyway, removing it. > > +def main() -> None: > > + """Main function to run the test""" > > + netcons_load_module() > > + test_check_dependencies() > > + with NetDrvEpEnv(__file__, nsim_test=True) as cfg: > > I think nsim_test=True will make the test run _only_ on netdevsim. > But there's nothing netdevsim specific here right? > You can remove the argument and let's have this run against real > drivers, too? Sure. that is our goal, by the end of the day. Being able to run it on real hardware. Thanks of the review. I will send an updated version soon, and we can continue the discusion over there. --breno