Hi,

> -----Original Message-----
> From: Stephen Hemminger <step...@networkplumber.org>
> Sent: Tuesday, October 29, 2024 6:07 PM
> To: Minggang(Gavin) Li <gav...@nvidia.com>
> Cc: Slava Ovsiienko <viachesl...@nvidia.com>; Matan Azrad
> <ma...@nvidia.com>; Ori Kam <or...@nvidia.com>; NBU-Contact-Thomas
> Monjalon (EXTERNAL) <tho...@monjalon.net>; Dariusz Sosnowski
> <dsosnow...@nvidia.com>; Bing Zhao <bi...@nvidia.com>; Suanming Mou
> <suanmi...@nvidia.com>; dev@dpdk.org; Raslan Darawsheh
> <rasl...@nvidia.com>; rongwei liu <rongw...@nvidia.com>
> Subject: Re: [PATCH V2 3/7] net/mlx5: add new devargs to control probe
> optimization
> 
> On Tue, 29 Oct 2024 16:27:25 +0800
> "Minggang(Gavin) Li" <gav...@nvidia.com> wrote:
> 
> > On 10/28/2024 11:47 PM, Stephen Hemminger wrote:
> > > On Mon, 28 Oct 2024 11:18:18 +0200
> > > "Minggang Li(Gavin)" <gav...@nvidia.com> wrote:
> > >
> > >> +- ``probe_opt_en`` parameter [int]
> > >> +
> > >> +  A non-zero value optimizes the probe process, especially for large
> scale.
> > >> +  PMD will hold the IB device information internally and reuse it.
> > >> +
> > >> +  By default, the PMD will set this value to 0.
> > >> +
> > > Is there ever a case where this should not be used?
> > >
> > > It would be better to just detect and use it if available.
> > > This driver does not need more options...
> > The new mechanism, which is required by few users, so we would not
> > break production and with the option we encourage to use new way only
> > those who actually needs. Once we see the new way is reliable - we
> > will change the default value.
> 
> I understand that philosophy but it leads to a maze of technical debt.

This specific case is not about philosophy in general.

We have users with huge number of SFs/VFs configured and experiencing the issues
with gigantic probing timings (literally - tens of minutes). This story was 
lasting
long time, we were trying different approaches, then admitted we had to update 
kernel,
etc., and eventually we had things done and it resulted in this series.

The new approach is event driven and based on the handling the new 
kernel-generated events.
So, it relies on system-wide environment and might be problematic on some hosts 
(we do not
expect too much though).

At the same time, the existing probe approach provides acceptable performance 
and
satisfies the vast majority of the users.  So, our main objective is not to 
break anything
in production (most users), the second objective - to resolve issues of some 
users with
configuration specifics (few users). That's why we would prefer to have the 
devarg
(with all its cons and pros) and set the devarg default value to false. Later, 
once the new kernel
API spreads and we have good production statistics we can consider altering the 
default
value to true or obsolete the devarg at all. Does this approach look reasonable?

> Has a full suite of tests been done with both settings of the option?
> Has both values been tested on all combinations of platforms and OS
> releases?

We cannot keep the new approach only - we have to maintain legacy kernel 
compatibility.
So - there always will be 2 branches of tests, till legacy kernels retirement.  
And having the devarg
might even simplify the testing - the single host can be used for both runs, 
with different devargs values.

> My point is every option adds to the necessary test matrix geometrically!

Once we added the new probing mechanics - the test matrix is ALREADY extended , 
regardless of devargs
implementation. The devarg just makes our users livings in fields easier.

With best regards,
Slava

Reply via email to