Hello Brian,

Thanks a lot for your explanations. 

Concerning your questions :
We used LiS 2-16 for historical reason, in fact for an old project based on this version of LiS.
I don't know Linux Fast-STREAMS,and  I'm afraid that, within the scope of this project,  we can't move from  LiS 2-16, however for future project, I would be very interested if you have a link to the documentation for Linux Fast Streams. (I couldn't find it on openss7.org).

Best regards,
-Philippe

Brian F. G. Bidulock wrote:
Phil,

Well, no it was not just an SMP problem.  The problem exists on UP under
LiS-2.18.0 as well.  There I put two lines in LiS-2.18.1 that made the
problem go away on UP at least.  Just

  if (q == 0)
	  return (0);

at the top of lis_await_qsched() in LiS/head/head.c  This correction was
made to LiS-2.18.1 in July 2005.

The problem is a lack of proper plumbing locks in LiS.  The situation is
aggravated if there are different processes at either end of the pipe (the
normal case) closing their ends at about the same time.  It seems on 2.18
that queue service procedure run after the queue pairs have been severed.
For LiS-2.16 it is also a race.

For LiS-2.16 and before this fix to 2.18, I had to put this in all module
and driver put procedure and service procedures that used pipes:  (I use
pipes extensively for testing modules and drivers, so that meant about all
of them.)

int
xxx_put(queue_t *q, mblk_t *mp)
{
#if defined LIS
	if (q->q_next == NULL || OTHERQ(q)->q_next == NULL) {
		cmn_err(CE_WARN, "%s: %s: LiS pipe bug: called with NULL q->q_next pointer.",
			MOD_NAME, __FUNCTION__);
		freemsg(mp);
		return (0);
	}
#endif
	...
}

int
xxx_srv(queue_t *q)
{
#if defined LIS
	if (q->q_next == NULL || OTHERQ(q)->q_next == NULL) {
		cmn_err(CE_WARN, "%s: %s: LiS pipe bug: called with NULL q->q_next pointer.",
			MOD_NAME, __FUNCTION__);
		return (0);
	}
#endif
	...
}

And in close procedures that try to pass messages.

LiS is full of races.  Be amazed if it runs on SMP at all.

Linux Fast-STREAMS does not have this problem because it has proper
plumbing locks and no put or service procedure can run while or after
it has severed the queue pair.  (Thus the #ifdef's above.)

Any particular reason why you're still using 2.16?

I'm just wrapping documentation for release:
Would you like to try using Linux Fast-STREAMS?

--brian


On Wed, 23 Nov 2005, phil wrote:

  
   Hello Gentlemen,
   I analyzed a oops that occured on an "old" version of LiS : 2-16  in a
   process  which  is  closing a streams pipe. I've attached the oops and
   the corresponding section of code hereafter.
   The  process  is  multithreaded  and   we've  perheaps  reached a race
   condition  where  two threads are dealing with the closing of the same
   pipe ...
   I found similar errors reported on Linux archive and I understand that
   such  race conditions  have been fixed with upper release of LiS. (i.e
   2-18).
   In  particular,  in  the below mail there's a reference to a patch for
   problem  3) "synchronization problem between pipes."  And I think that
   it could be the root cause of my problem.
   [1]http://www.mail-archive.com/[email protected]/msg018
   55.html
   My  question  is  : beyond the fact that LiS 2-18 will certainly fixed
   this issue,  I was wondering if someone could point me to the specific
   modification  that  has  fixed this specific issue (perheaps the patch
   mentioned in the email from Jeff)
   Best regards,
   -Philippe
   ----------------------------------------------------------------------
   -------------------------------------------------------
   == Section of code where the oops occured
   ----------------------------------------------------------------------
   -------------------------------------------------------
   int lis_safe_SAMESTR(queue_t *q, char *f, int l)
   {
       if (   lis_check_q_magic(q,f,l)
       && q->q_next != NULL
       && lis_check_q_magic(q->q_next,f,l)
          )
       return ((q->q_flag&QREADR) == (q->q_next->q_flag&QREADR));
                                                               ^^^^^^^^^
                                                                 ksymoops
   showed that q->q_next is null
                                                                 in spite
   of the condition test in if statement. 
       return 0;
   }
   ----------------------------------------------------------------------
   -------------------------------------------------------
   == The OOPS :
   ----------------------------------------------------------------------
   -------------------------------------------------------
   Unable  to  handle  kernel NULL pointer dereference at virtual address
   0000001c
    printing eip:
   f8b332b9
   *pde = 32ba1001
   *pte = 58e29067
   Oops: 0000
   netconsole  nfs  lockd  sunrpc  streams  swrmm  parport_pc  lp parport
   autofs4  audit e1000 tg3 ipv6 floppy sg microcode keybdev mousedev hid
   input usb-ohci usbcore e
   CPU:    3
   EIP:    0060:[<f8b332b9>]    Tainted: P
   EFLAGS: 00010002
   EIP is at lis_safe_SAMESTR [streams] 0x41 (2.4.21-20.ELsmp/i686)
   eax: 00000000   ebx: cb2852ac   ecx: f8bd4ad4   edx: 00000020
   esi: 0000068a   edi: f8bd4ad4   ebp: d6f61ed8   esp: d6f61ebc
   ds: 0068   es: 0068   ss: 0068
   Process oracle (pid: 1131, stackpage=d6f61000)
   Stack:  cb2852ac 00000000 cb2852ac f8b32c80 cb2852ac f8bd4ad4 0000068a
   00000202
           d6f61f00 f4f272b4 00003a98 f4f2753c f8b291d1 cb2852ac f4f272b4
   f8c0c960
           00000286 00000282 f4f2753c d6f61f30 00000000 f4f272b4 f8b29639
   f4f272b4
   Call Trace:   [<f8b32c80>] lis_qcountstrm [streams] 0x50 (0xd6f61ec8)
   [<f8bd4ad4>] .rodata.str1.4 [streams] 0x4584 (0xd6f61ed0)
   [<f8b291d1>] close_action [streams] 0x1c9 (0xd6f61eec)
   [<f8c0c960>] lis_stdata_sem [streams] 0x0 (0xd6f61ef8)
   [<f8b29639>] lis_doclose [streams] 0x2c9 (0xd6f61f14)
   [<f8b29850>] lis_strclose [streams] 0x178 (0xd6f61f44)
   [<c016472a>] __fput [kernel] 0xea (0xd6f61f78)
   [<c01628ae>] filp_close [kernel] 0x8e (0xd6f61f94)
   [<c0162956>] sys_close [kernel] 0x66 (0xd6f61fb0)
   Code: 8b 40 1c 83 e0 10 83 e2 10 39 c2 0f 94 c0 0f b6 c0 eb d3 55
   ----------------------------------------------------------------------
   -------------------------------------------------------

References

   1. http://www.mail-archive.com/[email protected]/msg01855.html
    

  

Reply via email to