Hi Amit,

On Thu, Nov 24, 2016 at 12:14:07PM +0000, Amitkumar Karwar wrote:
> > From: Brian Norris [mailto:[email protected]]
> > Sent: Monday, November 21, 2016 11:06 PM
> > To: Amitkumar Karwar
> > Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
> > [email protected]; [email protected]; Xinming Hu
> > Subject: Re: [PATCH v3 4/5] mwifiex: wait firmware dump complete during
> > card remove process
> > 
> > On Wed, Nov 16, 2016 at 06:39:08PM +0530, Amitkumar Karwar wrote:
> > > From: Xinming Hu <[email protected]>
> > >
> > > Wait for firmware dump complete in card remove function.
> > > For sdio interface, there are two diffenrent cases, card reset
> > trigger
> > > sdio_work and firmware dump trigger sdio_work.
> > > Do code rearrangement for distinguish between these two cases.
> > 
> > On second review of the SDIO card reset code (which I'll repeat is
> > quite ugly), you seem to be making a bad distinction here. What if
> > there is a firmware dump happening concurrently with your card-reset
> > handling?
> > You *do* want to synchronize with the firmware dump before completing the
> > card reset, or else you might be freeing up internal card resources
> > that are still in use. See below.
> 
> I ran some tests and observed that if same work function is scheduled
> by two threads, it won't have re-entrant calls. They will be executed
> one after another. In SDIO work function, we have SDIO card reset call
> after completing firmware dump. So firmware dump won't run
> concurrently with card-reset as per my understanding.

Ah, you're correct. It's somewhat obscure and potentially fragile, but
correct AFAICT. As you noted though, you do still have a use-after-free
bug, even if the concurrency isn't quite as high as I thought.

> > > Signed-off-by: Xinming Hu <[email protected]>
> > > Signed-off-by: Amitkumar Karwar <[email protected]>
> > > ---

...

> > > --- a/drivers/net/wireless/marvell/mwifiex/sdio.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
> > > @@ -46,6 +46,9 @@
> > >   */
> > >  static u8 user_rmmod;
> > >
> > > +static void mwifiex_sdio_work(struct work_struct *work); static
> > > +DECLARE_WORK(sdio_work, mwifiex_sdio_work);
> > > +
> > >  static struct mwifiex_if_ops sdio_ops;  static unsigned long
> > > iface_work_flags;
> > >
> > > @@ -220,7 +223,7 @@ static int mwifiex_sdio_resume(struct device
> > *dev)
> > >   * This function removes the interface and frees up the card
> > structure.
> > >   */
> > >  static void
> > > -mwifiex_sdio_remove(struct sdio_func *func)
> > > +__mwifiex_sdio_remove(struct sdio_func *func)
> > >  {
> > >   struct sdio_mmc_card *card;
> > >   struct mwifiex_adapter *adapter;
> > > @@ -249,6 +252,13 @@ static int mwifiex_sdio_resume(struct device
> > *dev)
> > >   mwifiex_remove_card(adapter);
> > >  }
> > >
> > > +static void
> > > +mwifiex_sdio_remove(struct sdio_func *func) {
> > > + cancel_work_sync(&sdio_work);
> > > + __mwifiex_sdio_remove(func);
> > > +}
> > > +
> > >  /*
> > >   * SDIO suspend.
> > >   *
> > > @@ -2227,7 +2237,7 @@ static void mwifiex_recreate_adapter(struct
> > sdio_mmc_card *card)
> > >    * discovered and initializes them from scratch.
> > >    */
> > >
> > > - mwifiex_sdio_remove(func);
> > > + __mwifiex_sdio_remove(func);
> > 
> > ^^ So here, you're trying to avoid syncing with the card-reset work
> > event, except that function will free up all your resources (including
> > the static save_adapter). Thus, you're explicitly allowing a use-after-
> > free error here. That seems unwise.
> 
> Even if firmware dump is triggered after card reset is started, it
> will execute after card reset is completed as discussed above. Only
> problem I can see is with "save_adapter". We can put new_adapter
> pointer in "save_adapter" at the end of mwifiex_recreate_adapter() to
> solve the issue.

Ugh, yet another band-aid? You might do better to still cancel any
pending work, just don't do it synchronously. i.e., do cancel_work()
after you've removed the card. It doesn't make sense to do a FW dump on
the "new" adapter when it was requested for the old one.

> > Instead, you should actually retain the invariant that you're doing a
> > full remove/reinitialize here, which includes doing the *same*
> > cancel_work_sync() here in mwifiex_recreate_adapter() as you would in
> > any other remove().
> 
> We are executing mwifiex_recreate_adapter() as a part of sdio_work(). Calling
> cancel_work_sync() here would cause deadlock. The API is supposed to waits
> until sdio_work() is finished.

You are correct. So using the _sync() version would be wrong.

> > 
> > IOW, kill the __mwifiex_sdio_remove() and just call
> > mwifiex_sdio_remove() as you were.
> > 
> > That also means that you can do the same per-adapter cleanup in the
> > following patch as you do for PCIe.
> 
> Currently as sdio_work recreates "card", the work structure can't be moved 
> inside card structure.
> Let me know your suggestions.

If you address the TODO in mwifiex_create_adapter() instead, you can get
past this problem:

        /* TODO mmc_hw_reset does not require destroying and re-probing the
         * whole adapter. Hence there was no need to for this rube-goldberg
         * design to reload the fw from an external workqueue. If we don't
         * destroy the adapter we could reload the fw from
         * mwifiex_main_work_queue directly.

The "save_adapter" is an abomination that should be terminated swiftly,
but it is perpetuated in part by the hacks noted in the TODO.

So I'd recommend addressing the TODO ASAP, but in the meantime, a hack
like my suggestion (cancel the FW dump work w/o synchronizing) or --
less preferably -- yours (manually set 'save_adapter' again) might be
OK.

I think I've asked elsewhere but didn't receive an answer: why is
SDIO's mwifiex_recreate_adapter() so much different from PCIe's
mwifiex_do_flr()? It seems like the latter should be refactored to
remove some of the PCIe-specific cruft from main.c and then reused for
SDIO.

Brian

Reply via email to