Phil,
Well, no it was not just an SMP problem. The problem exists on UP under
LiS-2.18.0 as well. There I put two lines in LiS-2.18.1 that made the
problem go away on UP at least. Just
if (q == 0)
return (0);
at the top of lis_await_qsched() in LiS/head/head.c This correction was
made to LiS-2.18.1 in July 2005.
The problem is a lack of proper plumbing locks in LiS. The situation is
aggravated if there are different processes at either end of the pipe (the
normal case) closing their ends at about the same time. It seems on 2.18
that queue service procedure run after the queue pairs have been severed.
For LiS-2.16 it is also a race.
For LiS-2.16 and before this fix to 2.18, I had to put this in all module
and driver put procedure and service procedures that used pipes: (I use
pipes extensively for testing modules and drivers, so that meant about all
of them.)
int
xxx_put(queue_t *q, mblk_t *mp)
{
#if defined LIS
if (q->q_next == NULL || OTHERQ(q)->q_next == NULL) {
cmn_err(CE_WARN, "%s: %s: LiS pipe bug: called with NULL
q->q_next pointer.",
MOD_NAME, __FUNCTION__);
freemsg(mp);
return (0);
}
#endif
...
}
int
xxx_srv(queue_t *q)
{
#if defined LIS
if (q->q_next == NULL || OTHERQ(q)->q_next == NULL) {
cmn_err(CE_WARN, "%s: %s: LiS pipe bug: called with NULL
q->q_next pointer.",
MOD_NAME, __FUNCTION__);
return (0);
}
#endif
...
}
And in close procedures that try to pass messages.
LiS is full of races. Be amazed if it runs on SMP at all.
Linux Fast-STREAMS does not have this problem because it has proper
plumbing locks and no put or service procedure can run while or after
it has severed the queue pair. (Thus the #ifdef's above.)
Any particular reason why you're still using 2.16?
I'm just wrapping documentation for release:
Would you like to try using Linux Fast-STREAMS?
--brian
On Wed, 23 Nov 2005, phil wrote:
>
> Hello Gentlemen,
> I analyzed a oops that occured on an "old" version of LiS : 2-16 in a
> process which is closing a streams pipe. I've attached the oops and
> the corresponding section of code hereafter.
> The process is multithreaded and we've perheaps reached a race
> condition where two threads are dealing with the closing of the same
> pipe ...
> I found similar errors reported on Linux archive and I understand that
> such race conditions have been fixed with upper release of LiS. (i.e
> 2-18).
> In particular, in the below mail there's a reference to a patch for
> problem 3) "synchronization problem between pipes." And I think that
> it could be the root cause of my problem.
> [1]http://www.mail-archive.com/[email protected]/msg018
> 55.html
> My question is : beyond the fact that LiS 2-18 will certainly fixed
> this issue, I was wondering if someone could point me to the specific
> modification that has fixed this specific issue (perheaps the patch
> mentioned in the email from Jeff)
> Best regards,
> -Philippe
> ----------------------------------------------------------------------
> -------------------------------------------------------
> == Section of code where the oops occured
> ----------------------------------------------------------------------
> -------------------------------------------------------
> int lis_safe_SAMESTR(queue_t *q, char *f, int l)
> {
> if ( lis_check_q_magic(q,f,l)
> && q->q_next != NULL
> && lis_check_q_magic(q->q_next,f,l)
> )
> return ((q->q_flag&QREADR) == (q->q_next->q_flag&QREADR));
> ^^^^^^^^^
> ksymoops
> showed that q->q_next is null
> in spite
> of the condition test in if statement.
> return 0;
> }
> ----------------------------------------------------------------------
> -------------------------------------------------------
> == The OOPS :
> ----------------------------------------------------------------------
> -------------------------------------------------------
> Unable to handle kernel NULL pointer dereference at virtual address
> 0000001c
> printing eip:
> f8b332b9
> *pde = 32ba1001
> *pte = 58e29067
> Oops: 0000
> netconsole nfs lockd sunrpc streams swrmm parport_pc lp parport
> autofs4 audit e1000 tg3 ipv6 floppy sg microcode keybdev mousedev hid
> input usb-ohci usbcore e
> CPU: 3
> EIP: 0060:[<f8b332b9>] Tainted: P
> EFLAGS: 00010002
> EIP is at lis_safe_SAMESTR [streams] 0x41 (2.4.21-20.ELsmp/i686)
> eax: 00000000 ebx: cb2852ac ecx: f8bd4ad4 edx: 00000020
> esi: 0000068a edi: f8bd4ad4 ebp: d6f61ed8 esp: d6f61ebc
> ds: 0068 es: 0068 ss: 0068
> Process oracle (pid: 1131, stackpage=d6f61000)
> Stack: cb2852ac 00000000 cb2852ac f8b32c80 cb2852ac f8bd4ad4 0000068a
> 00000202
> d6f61f00 f4f272b4 00003a98 f4f2753c f8b291d1 cb2852ac f4f272b4
> f8c0c960
> 00000286 00000282 f4f2753c d6f61f30 00000000 f4f272b4 f8b29639
> f4f272b4
> Call Trace: [<f8b32c80>] lis_qcountstrm [streams] 0x50 (0xd6f61ec8)
> [<f8bd4ad4>] .rodata.str1.4 [streams] 0x4584 (0xd6f61ed0)
> [<f8b291d1>] close_action [streams] 0x1c9 (0xd6f61eec)
> [<f8c0c960>] lis_stdata_sem [streams] 0x0 (0xd6f61ef8)
> [<f8b29639>] lis_doclose [streams] 0x2c9 (0xd6f61f14)
> [<f8b29850>] lis_strclose [streams] 0x178 (0xd6f61f44)
> [<c016472a>] __fput [kernel] 0xea (0xd6f61f78)
> [<c01628ae>] filp_close [kernel] 0x8e (0xd6f61f94)
> [<c0162956>] sys_close [kernel] 0x66 (0xd6f61fb0)
> Code: 8b 40 1c 83 e0 10 83 e2 10 39 c2 0f 94 c0 0f b6 c0 eb d3 55
> ----------------------------------------------------------------------
> -------------------------------------------------------
>
> References
>
> 1.
> http://www.mail-archive.com/[email protected]/msg01855.html
--
Brian F. G. Bidulock ¦ The reasonable man adapts himself to the ¦
[EMAIL PROTECTED] ¦ world; the unreasonable one persists in ¦
http://www.openss7.org/ ¦ trying to adapt the world to himself. ¦
¦ Therefore all progress depends on the ¦
¦ unreasonable man. -- George Bernard Shaw ¦
_______________________________________________
Linux-streams mailing list
[email protected]
http://gsyc.escet.urjc.es/mailman/listinfo/linux-streams