Re: [networking-discuss] GLDv3 locking concerns...

Garrett D'Amore Thu, 03 May 2007 13:07:09 -0700

Thirumalai Srinivasan wrote:

Garrett,
To prevent deadlocks and recursive locks, we need to have a strictlock orderingacross layers. Currently the vast majority seem to be use top ->bottom orderingwhile holding locks across layers, since most of the control paths gotop down.
A simple rule set would be
* Each layer is free to decide on the locking hierarchy used withrespect to locks
  it owns, i.e. the locks in its own layer
* There is a single global order for holding locks across layers, i.e.top to bottom.
  So layer A can hold a lock and call into layer B, only if A is above B.
Trying to fine grain the above, and defining the order on a per entrypoint basis makes thepermutations unmanageable. Note that the problem you see is not driverspecific. Every
layer has to handle the problem in the upcall direction.
With the above rules, no layer (including the driver) may hold anylock in the upcall direction.

But if the calling context is broken (by use of a taskq for instance),then the risks you describe go away. Since that calling context isalready broken, I just would like a guarantee that it will *remain* so.Then I can get away with this kind of approach. Otherwise I add a lotof complexity or an additional context switch (taskq) into my drivers.(Multiple drivers suffer this problem.)

And yes, my suggestion does add a small amount of complexity to the*rules* for drivers (an exception for a few mac_xxx_update entrypoints), but the gain is the removal of significant complexity oradditional context switches in the device drivers.

I'll trade complexity in the framework for elimination of thatcomplexity in drivers (repeated across multiple drivers.)


   -- Garrett

Thirumalai


Garrett D'Amore wrote:
Eric Cheng wrote:
Garrett D'Amore wrote:
So, given this, it is it safe for me to just callmac_{link,tx_notify() while holding driver locks?
I would recommend that you not do this even though the code appearsto be safe for now. the reason is that you do not know what thenotify callback is going to do. it may somehow loopback down to thedriver to acquire the same lock you're holding.
I realize that is a concern, which is why I'm asking here. If allthe notify does (and ever will do) is fire off a taskq, then itsgoing to be safe. But if it loops back (or ever will), then its not.
Or must I go to contortions (such as adding my own driver taskq, ora soft interrupt) to ensure that I never call this while holding adriver specific lock?
what's the problem with releasing the lock, then callmac_link_notify()?
You cant just "release" the lock in the middle of your code, always.There are often considerations that have to be made about whether itis actually _safe_ to drop the lock. Assumptions that you made andwere protected by the lock, are no longer true after the lock isdropped. (Of course, if you're doing the notify at nearly the sameplace as you drop the lock anyway, its easy. But if the notify isdeep inside nested code, its much harder to do safely.)
For example, on the tx hot path, I might find that I'm out ofdescriptors. If that happens, I need to attempt to reclaim them. IfI do reclaim some, and if the mac layer was previously suspended dueto flow control, then I need to notify the mac layer.
All this happens in the hot path, and dropping and reacquiring thelock here would be a significant pain in the butt.
I can try to track whether I've reclaimed resources or not, and dothe notify later after the lock is reclaimed, but that significantlycomplicates the code, separates the action of notification far fromthe point when I've discovered that I should do the notification, andis thus likely to be a source of errors, particularly when the codeis maintained by someone else who may not see the bigger picture.
Its much simpler if I have just one call to mac_tx_update, at thepoint at which I've discovered that I need to do it, rather thandeferring it to some other point after I've dropped the lock.
Will this guarantee hold true going forward? Obviously, since itdoes a lot to simplify driver design, and eliminate extra contextswitches, I'd really prefer to be able to rely on this behavior.
we're redesigning nemo's locking as part of the crossbow project.you will hear more about this in several weeks time. for now, I'dsuggest that you try not to hold locks across any mac_* interface.if there are issues with doing this, let us know and we'll take yourneeds into account in our new design.
See the above. I'd really, really like to have mac_xxx_update()continue to use the asynchronous taskq notification and therefore besafe to call with locks held, if at all possible. Obviously theother mac functions (e.g. mac_rx, etc.) really do need to be calledwithout any locks held.
   -- Garrett

_______________________________________________
networking-discuss mailing list
[email protected]


_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] GLDv3 locking concerns...

Reply via email to