The more recent "lettered" versions of LiS provide for a construct in the Config file in which you can declare the degree of locking (semaphore) protection upon entry to put/srv routines.
qlock driver <driver-name> <value>
The <value> is as follows:
0 = no locking
1 = lock each q half individually (defaul)
2 = lock both q halves together
3 = use a global lock
Type 1 is the old LiS behavior. Drivers that are written defensively for SMP environments can probably run with option 0. With no locking at all it no longer matters whether you do putnext() from interrupt level.
Personally, I don't like calling putnext() from interrupt context.
So I would have coded this construct to fabricate the indication message and do a putq() into the read queue so as to let the service procedure send it upstream from guaranteed non-interrupt context.
-- Dave
At 04:11 PM 6/20/2004, dan_gora wrote:
Dave...
Does this rule also apply to functions called from a timeout() routine (which
is _kind_ of an interrupt)?
I am having a problem now with one of my drivers which needs to send an
indication to the user every n*10ms (where n can be from 1 to 255). I have a
timeout routine which runs every 10ms which goes and reads a value from my
board. Every n times the timeout runs, an indication is allocb'd then
putnext()'d to the stream head from the timeout routine.
Pseudocode:
dantimeout(){
**** read data from board *****
if (++tocnt == indication interval)
{
send_indication();
tocnt = 0;
}
}
send_indication() {
pm = allocb(sizeof(indication);
pm->b_datap->db_type = M_PROTO;
**** fill in message ****
if (pstrm->rdq)
if (canputnext(pstrm->rdq))
putnext(pstrm->rdq, pm);
}
Everything works fine if just this routine runs, but when I try to send and
receive data the machine panics and I get a message on the console:
Scheduling in interrupt
kgdb assertion failed: BUG.
I have kgdb set up (I'm running 2.4.23 with the kgdb patches) and I get a
stack trace which looks like:
<- "(gdb) "
-> "bt\n"
<- "#0 breakpoint () at kgdbstub.c:1005\n"
<- "#1 0xc0117f28 in do_schedule () at sched.c:564\n"
<- "#2 0xc0117f98 in kern_do_schedule (regs={ebx = -678482736, ecx =
-678482724, edx = 0, esi = -766648320, edi = -766648320, ebp = -766641116,
eax = 1, xds = -671154152, xes = -766705640, orig_eax = 1, eip = -1072668199,
xcs = 16, eflags = 514, esp = -766641160, xss = -766705640}) at
sched.c:717\n"
<- "#3 0xc010795f in kern_schedule () at proc_fs.h:150\n"
<- "#4 0xc01061d9 in __down_interruptible (sem=0xd78f2cd0) at sched.h:964\n"
<- "#5 0xc01062a7 in __down_failed_interruptible () at proc_fs.h:150\n"
<- "#6 0xd889cfa7 in .text.lock.KBUILD_BASENAME () from
/root/LiS/LiS-2.17.2.mod/LiS-2.17.2/streams.o\n"
"#7 0xd78f2c60 in ?? ()\n"
<- "#8 0xd2a8b260 in ?? ()\n"
<- "(gdb) "
All of this really smells like the deadly embrace that you are describing
here, but the machine doesn't lock up it just panics.
The strange thing is that not everything makes a lot of sense. When I try to
look at some of the lock debugging information in LiS I get a lot of
nonsensical information. For example, looking at the lis_spin_lock_count, I
always get 0 and the lis_spl_track array just has what looks like a lot of
trash in it....
I thought that I had LiS compiled with CONFIG_DEV enabled, so I don't quite
understand why I don't get more useful looking information from these things.
(config.in is attached...)
I cannot really just put the message on the read queue and have the service
routine deliver it because it needs to be delivered in a timely manner, and
there is no guarantee that the service routine will get back around in a
reasonable time. (The indications contain timestamp information that has to
be delivered before the contents are a total lie...)
Do you have any suggestions as to what I can try? I'm am really stuck trying
to see the panic on kgdb because of all of the inline assembly that's
involved and the fact that it appears to be a race between the interrupt and
the putnext() from the timeout routine.
Can I hold an irq lock that would block the interrupt across the putnext() in
the timeout routine? I've read that it's not exactly good form to hold locks
across putnext because you could get reentered if the upstream put routine
turns the message around and sends it back to you, but in this case the
upstream module is just the stream head, so it would seem pretty safe.
thanks-
dan
--- Dave Grothe <[EMAIL PROTECTED]> wrote:
>
> At 03:28 PM 5/28/2004, Eugene LiS User wrote:
>
> >Dave Grothe <[EMAIL PROTECTED]> wrote:
> > >At 02:40 PM 5/28/2004, Eugene LiS User wrote:
> > >
> > >>Is it OK to call lis_safe_putmsg() from interrupt context?
> > >
> > >No.
> >
> >The put() is directly mapped to lis_safe_putmsg().
> >
> >I was under impression that it is OK to call put()
> >from interrupt context.
> >
> >Why not?
>
> It is extremely bad practice and results in extending interrupt processing
> by performing protocol related functions that should be deferred to
> background. Use putq() at interrupt level and have the service routine
> process the message.
>
> If the STREAMS executive is to provide protection from re-entry of put/srv
> procedures then it would have to surround each putnext() call with
> spinlock_irqsave() calls in order to protect against this case. This makes
>
> matters even worse, locking out interrupts while STREAMS service procedures
>
> and put procedures execute performing tasks totally unrelated to interrupt
> handling, or shared variables therewith.
>
> In Linux your put/srv routines become further restricted as to the kernel
> calls that they can make if they are entered while holding a spin
> lock. That is why LiS now uses a semaphore for this exclusion rather than
> a spin lock.
>
> An operating system such as Solaris which runs interrupts as threads, on
> their own stacks, can use adaptive mutexes for this exclusion because
> interrupt routines actually can sleep in Solaris. It is essentially the
> ability to use a semaphore at interrupt time!
>
> Linux does not do this, so if you call putnext() from interrupt level you
> will be asking for a deadly embrace hang if the interrupted thread happens
> to be holding the semaphore associated with the queue, i.e., executing in
> the put/srv routine.
>
> In the latest LiS Alphas there is a qlock option for drivers. If you set
> this option to 0, meaning no locking, then you can call putnext() from
> interrupt level without fear of deadlock, but still with the "bad practice"
>
> caveats stated above. Also, your driver put/srv routines can be entered
> simultaneously from multiple CPUs and you have to sort that out inside your
>
> driver. If you want to see an example of a driver that handles that
> nicely, look at Brian's inet.c driver and observe the locking mechanism
> that he uses. Just find the "put" procedure and start tracing down the
> code from there.
>
> -- Dave
>
>
> >
> ---
> Outgoing mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.691 / Virus Database: 452 - Release Date: 5/26/2004
>
