Hallo Dave,
I was happy, receiving your mail, that You implemented a more general
solution for our problem. I was curiuos how You did it and did download
the new version, to look around in the code. Some things I did not
understand really and mentioned them. I think, that there are also some
constraints, which You did not mention and which we should know. So I
made some notes and ask You to answer my questions.
best regards
Klaus Golbach
This is what I believe to have understood:
In LiS-version before there is already a function lis_lockq/lis_unlockq,
which is mapped to a lis_spin_lock on a queue specific lock (?).
In the new LiS-version lis_lockq/lis_unlockq is mapped to a nested
semaphore. The semaphore seems to be (depending on what qlock is set in
config file) global for a module or common for a queue pair and so on.
The place, where lis_lockq/lis_unlockq was called, is not changed.
This lis_lockq/lis_unlockq synchronizes (beside other things) the calls
of read/write-service and read/write-put. There is a
lis_lockq/lis_unlockq pair around the q_qinfo->qi_putp call (see
lis_safe_putmsg). There is also a a lis_lockq/lis_unlockq pair around
the q->q_qinfo->qi_srvp call in queuerun.
These are my remarks and questions:
- open close:
I did not understand, how it works for open close. As I understood your
remark, close and the other streams entry points do not run in parallel,
because all queues where flushed before closing, so no new calls of
service and put are provoked. But what about a putq or putnext called
from interrupt context or from another streams driver done in parallel
to closing?
Your driver can process messages up until the time its close() routine is called by LiS. Once your close routine is called you should take steps to ensure that no activity is attempted on the given stream. For example, a multiplexing driver could get messages from below while closing an upper stream. This is an internal matter and needs synchronization within the driver itself. You need to ensure that messages are not routed to the upper stream that is closing (driver close routine running, or has already run).
Does your construction really guarantee that (with global lock set) a
close never runs in parallel with any other streams-entry-point of this
module?
The protection logic is in lis_qdetach(). The queues are locked and QPROCSOFF flags are set. This flag prevents put and service procedure entry from that point onward. Shortly thereafter the driver close routine is called.
For better imagination I tell about a common problem. Most of
our Streams drivers are clonable and they hold a (module specific)
linked list of control blocks, which are associated to each pair of
queues. At every open /close a control block is added or removed from
this linked list. On other events the drivers walk through the linked
list searching for special attributes. So if one thread does a close of
a stream (a control block is removed) and another thread is walking in
parallel through the list of all control blocks, this may lead to
unexpected results.
Solution could be: a lis_lockq/lis_unlockq pair around q_info->qi_srvp
call. Constraint: a sleep in open/close will stop all functions of this
module for a while!
LiS does not hold the q_lock when the driver close routine is called. Similarly for driver open. So these routines can sleep without blocking other streams activity.
I have drivers that do similar things and use a spin lock to protect the linked list structure while it is being traversed. This would be a good application for read/write type spin locks.
- Not asking for return value of lis_lockq:
On lis_qdetach You do not care on the return value of lis_lockq. I
think, this is a bug. Assume one thread (for instance queuerun) holds
the semaphore and is executing some service routine. Another thread is
executing lis_qdetach, is waiting on the semaphore and is now killed.
Then the following code in lis_qdetach is executed unprotected. Then the
following lis_unlockq in lis_qdetach will increment counter and cause a
lis_up. As consequence the protection in queuerun is lost and
furthermore the nested counter is spoiled forever. At least spoiling of
the nested counter should be prevented.
This routine is called when the queue is about to be deallocated. Our reference to it should be the last reference. With local queue locks it would be OK. But in the case of a global lock you are correct. I changed it to prevent the call to unlockq if the lockq returned an error.
- Is it for sure, that the calls of lis_lockq and lis_unlockq are always
done in a thread context?
If they are called in interrupt context, the semaphore operation will
not work! If someone does a putnext from interrupt context, this will be
harmful! I think it should be forbidden to call putnext out of interrupt
context. Hopefully the other calls like putq or qenable do not use
lis_lockq/lis_unlockq. Yet I did not find such problems.
The constraint, basically, is never to call putnext() from interrupt context. Putq() from interrupt context is safe. All other calls are from thread context.
- "no locking" drivers are slowed down, when interacting with "locking"
drivers
What happens if a "no locking " driver calls a "global lock" driver by
putnext? I guess, the "no locking" driver waits also on the semaphore. I
think, one would accept the fact, that a "global locked" driver is a
bottle neck in a streams stack.
Correct.
- Lock over putnext
In the interior of putnext a semaphore down/up in lis_lockq/lis_unlockq
is called. To hold a lock over putnext will result in unpredictable
behaviour. It is in any way not advisable to hold a lock over putnext.
This is not a very hard constraint.
You should be especially careful not to hold spin locks over calls to putnext(). The 2.6 kernel is quite fussy about holding spin locks while calling routines that might sleep.
-- Dave
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.663 / Virus Database: 426 - Release Date: 4/20/2004
