Hi Matan,

On Wed, Jan 31, 2018 at 05:07:56PM +0000, Matan Azrad wrote:
> Hi Adrien
> 
> From: Adrien Mazarguil , Sent: Wednesday, January 31, 2018 4:32 PM
> > Hi Matan,
> > 
> > On Wed, Jan 31, 2018 at 01:44:41PM +0000, Matan Azrad wrote:
> > > Hi Adrien
<snip>
> > > I don't know what any application does but for me it is a mistake to
> > > stop all event processes in dev_stop(), Maybe for other application
> > maintainers too.
> > 
> > Just like you, I don't know either what all the applications ever written 
> > for
> > DPDK expect out of dev_stop(). What's for sure is that currently, LSC/RMV
> > don't occcur afterward, the same way these events do not occur before
> > dev_start().
> 
> Why not? RMV event can occur any time after probe.

LSC as well (keep in mind this patch modifies the behavior for both
events). RMV events may also occur before application has a chance to
register a handler for it, in which case this approach fails to solve the
problem it's supposed to solve. Mitigate all you want, the application still
can't rely on that event only.

> > Any application possibly relying on this fact will break. In such a
> > situation, a conservative approach is better.
> 
> If an application should fail to get event in stopped state it may fail in 
> the previous code too:
> The interrupt run from host thread so the next race may occur:
> dev_start() : master thread.
> Context switch.
> RMV interrupt started to run callbacks: host thread.
> Context switch.
> dev_stop(): master thread.
> Start reconfiguration: master thread. 
> Context switch.
> Callback running.
> 
> So, the only thing which can disable callback running after dev_stop() is 
> callback unregistration before it.

After dev_stop() returns, new events cannot be triggered by the PMD which is
what matters. Obviously a callback that already started to run before that
will eventually have to complete. What's your point?

There's a race only if an application performs multiple simultaneous control
operations on the underlying device, but this has always been unsafe (not
only during RMV) because there are no locks, it's documented as such.

> > > > Setting up RMV/LSC callbacks is not the only configuration an
> > > > application usually performs before calling dev_start(). Think about
> > > > setting up flow rules, MAC addresses, VLANs, and so on, this on
> > > > multiple ports before starting them up all at once. Previously it
> > > > could be done in an unspecified order, now they have to take special 
> > > > care
> > for RMV/LSC.
> > >
> > > Or maybe there callbacks code are already safe for it.
> > > Or they manages the unregister\register calls in the right places.
> > 
> > That's my point, these "maybes" don't argue in favor of changing things.
> 
> What I'm saying is that callbacks should be safe or not registered in the 
> right time.

I understand that, though it's not a valid counter argument :)

> > > > Many devops are only safe when called while a device is stopped.
> > > > It's even documented in rte_ethdev.h.
> > > >
> > >
> > > And?
> > 
> > ...And applications therefore often do all their configuration in an 
> > unspecified
> > order while a port is stopped as a measure of safety. No extra care is taken
> > for RMV/LSC. This uncertainty can be addressed by not modifying the current
> > behavior.
> 
> Or they expect to get interrupt and the corner case will come later if we 
> will not change it.

Look, we're throwing opposite use cases at each other in order to make a
point, and I don't see an end to this since we're both stubborn. Let's thus
assume applications use a bit of both.

Now we're left with a problem, before this patch neither use cases were
broken. Now it's applied, mine is broken so let's agree something needs to
be done. Either all affected applications need to be updated, or we can
simply revert this and properly fix fail-safe instead.

<snip>
> > > So, at least for RMV event, we need the notification also in stopped 
> > > state.
> > 
> > You sent the rte_eth_dev_is_removed() series. You're aware that PMDs
> > implementing this call benefit from an automatic is_removed() check on all
> > remaining devops whenever some error occur.
> > In short, an application will get notified simply by getting dev_start() 
> > (or any
> > other callback) return -EIO and not being able to use the device.
>  
> Yes, but between dev_stop to dev_start may not be any ethdev API calling.

So what, if an application is not using the device, why does it need to know
it's been removed? If it's that important, why can't it run its own periodic
rte_eth_dev_is_removed() probe?

> > PMDs that do not implement is_removed() (or in addition to it) could also
> > artificially trigger a RMV event after dev_start() is called. As long as 
> > the PMD
> > remains quiet while the device is stopped, it's fine.
> 
> How can the PMD do it after dev_start()? Initiate alarm in dev start function 
> to do it later And entering into race again?

What race? All the PMD needs to to is trigger an event by registering one
with immediate effect, it won't make any difference to the application. If
it performs racy control operations from the handler, it's never been a
PMD's problem.

Anyway I just provided this idea as a counter argument, it's not really
needed; relying on rte_eth_dev_is_removed() is the safest approach in my
opinion.

> I think it is not worth the effort and this patch is doing the right 
> thing(also some other PMDs) .

Well, it's probably too late to revert this patch for 18.02 since one would
also have to come up with the proper fix for fail-safe, however that doesn't
mean breaking things and expecting applications to deal with it because it's
never been properly documented is OK either.

I'm post-NACKing this patch, it will have to be reverted and a fix submitted
for the upcoming 18.02 stable branch. In the meantime, it's not a fix for
mlx4 and as such must not be backported.

-- 
Adrien Mazarguil
6WIND

Reply via email to