On Sun, Apr 27, 2025 at 09:00:31AM +0200, Morten Brørup wrote:
> > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > Sent: Saturday, 26 April 2025 17.24
> > 
> > On Fri, 25 Apr 2025 13:52:55 +0200
> > Morten Brørup <m...@smartsharesystems.com> wrote:
> > 
> > > Bruce,
> > >
> > > rte_eth_stats_get() on Intel NICs seems slow to me.
> > >
> > > E.g. getting the stats on a single port takes ~132 us (~451,000 CPU
> > cycles) using the igb driver, and ~50 us using the i40e driver.
> > >
> > > Referring to the igb driver source code [1], it's 44 calls to
> > E1000_READ_REG(), so the math says that each one takes 3 us (~10,000
> > CPU cycles).
> > >
> > > Is this expected behavior?
> > >
> > > It adds up, e.g. it takes a full millisecond to fetch the stats from
> > eight ports using the igb driver.
> > >
> > > [1]:
> > https://elixir.bootlin.com/dpdk/v24.11.1/source/drivers/net/e1000/igb_e
> > thdev.c#L1724
> > >
> > >
> > > Med venlig hilsen / Kind regards,
> > > -Morten Brørup
> > >
> > 
> > Well reading each stat requires a PCI access. And PCI accesses are non-
> > cached.
> 
> You're right, thank you for reminding me. I was caught by surprise that 
> getting 7 counters took so long.
> Perhaps reading 44 NIC registers over the PCI bus is required to calculate 
> those summary counters. Or nobody cared to optimize this function to only 
> read the necessary registers.
> 
> We periodically poll the ethdev stats in a management thread, and I noticed 
> the duration because of the jitter it caused in that thread.
> It's not a real problem. If it was, we could easily move it to a separate 
> thread or poll the counters iteratively port by port, instead of all ports in 
> one go.
> A longwinded way of saying: Probably not worth optimizing. ;-)
> 

I actually think it is something that we should consider optimizing, but I
also think it needs to be done at the API level. Even if the user is only
interested in e.g. the Rx bytes counter, the only way to get that is to
retrieve the full stats set and use the counter from it. Therefore, instead
of reading (possibly) just one register, you end up reading 44 as in the
case you describe. Maybe we need to add a stats mask to the get call, to
allow a user to indicate that they only want a subset of the stats, in
order to improve performance.

/Bruce

Reply via email to