Matthew,

Yes, single-threading LiS would solve many problems.  Don't expect
much performance, however, running on a single thread on a single CPU.

--brian

On Mon, 01 Mar 2004, Matthew Gierlach wrote:

> Hi Dave:
> 
>       We received the following suggested change to lis_run_queues() in
>       LiS/head/stream.c from a SUN field support engineer in Japan. I've
>       asked for the explanation behind this change and do not have a
>       response or answer just yet.
> 
>       We have made this change and it has run without incident for 60 hours
>       on the dual Xeon system with hyperthreading enabled. All 4 CPUs show
>       usage under   top.
> 
>       I am fascinated that a line of code previously exclusively reserved for
>       non-SMP environments has such dramatic effect and results in the SMP
>       environment.
> 
>       Is this perhaps just a SUN anomoly? Since we're getting a lot of pressure
>       for support of this system, we've not attempted to run this change on
>       other UP and SMP systems. What's your advice about this change?
> 
>       Thanks, Matt
> 
> ---------- Forwarded message ----------
> Date: Fri, 27 Feb 2004 17:05:28 +0900
> From: Takuya Watanabe <[EMAIL PROTECTED]>
> To: Mark Ma <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED],
>      [EMAIL PROTECTED], Richard Barry-Smith <[EMAIL PROTECTED]>,
>      Takayuki Nakajima <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
> Subject: Re: SunFire V60x help (fwd)
> 
> Dear Mark,
> Yesterday, I looked into LiS-2.16.18-1 src code,
> and I changed /usr/net/Adax/LiS-2.16.18-1/LiS-2.16.18-1/head/stream.c as
> follows.
> ( "-" means deleted)
> void
> lis_run_queues(int cpu)
> {
>     extern int  lis_runq_sched ;        /* linux-mdep.c BH code */
> 
>     while (lis_atomic_read(&lis_runq_req_cnt) > 0)
>     {
> - #if !defined(__SMP__)                   /* only for single-threaded */
>         if (lis_atomic_read(&lis_queues_running)) /* recursion protection */
>             return;
> - #endif
>         lis_atomic_inc(&lis_queues_running);
>         lis_atomic_inc(&lis_runq_active_flags[cpu]) ;
>         queuerun(cpu);                  /* really run the queues */
>         lis_atomic_dec(&lis_runq_active_flags[cpu]) ;
>         lis_atomic_dec(&lis_queues_running);
>     }
> 
>     lis_runq_sched = 0 ;                /* OK to V semaphore now */
> 
> }/*lis_run_queues*/
> 
> 
> I am not sure this is right solution for AS3.0 on V60X,  however
> now atmiitest is working with AS 3.0 smp kernel with hyper threading
> enabled on V60X......
> 
> Thank you and best regards./takuya
> 
> 
> 
> Mark Ma wrote:
> 
> >Phillip and Dennis,
> >
> >Per email with John, Adax is working with Sun Japan for a Fujitsu RNC-Sim
> >design win. We are running into the following system issue which request
> >Sun's valuable inputs. Your advice are highly appreciated.
> >
> >Gratefully,
> >
> >Mark Ma
> >Regional Manager, APAC
> >Adax Inc.
> >[EMAIL PROTECTED]
> >408 829 8202
> >
> >
> >Date: Wed, 25 Feb 2004 01:21:12 -0800
> >From: John Atchison <[EMAIL PROTECTED]>
> >To: [EMAIL PROTECTED], [EMAIL PROTECTED]
> >Subject: [Fwd: FW: FW: Red Hat Linux on SunFire v60x (fwd)]
> >
> >
> >Mark,
> >
> >Please let me know if your questions have been answered. If not,
> >please contact one of the gentlemen below:
> >
> >Phillip Pham, [EMAIL PROTECTED],  408-907-9546 :  BIOS Engineer
> >Dennis Tiu, [EMAIL PROTECTED] , 650 352 5081:  REV Engineering, can
> >provide Linux support
> >
> >Regards,
> >
> >John
> >
> >
> >
> >---------- Forwarded message ----------
> >Date: Tue, 24 Feb 2004 11:52:54 -0800 (PST)
> >From: Richard Barry-Smith <[EMAIL PROTECTED]>
> >To: [EMAIL PROTECTED]
> >Subject: SunFire V60x help
> >
> >
> >Hello Ms. Yuen,
> >
> >I work with Michael Khoury for Adax, Inc. in their Integration Department.
> >We have been troubleshooting the SunFire v60x, and been having difficuties
> >getting RedHat Enterprise 3.0 AS to run when the Hyperthread Option in the
> >BIOS has been set to Enable.
> >
> >Thus far, the system runs for approximately 20-30 minutes than it panics.
> >With Hyperthreading Disabled in the BIOS, the Adax card
> >works. In Uni-Processor mode, the Adax card runs fine too.
> >
> >Unfortunately, we have not been able to get a Linux Ooops
> >from our testing, so we do not know the actual nature of the panic.
> >This is what we have been able to access from the console:
> >
> >  [<f8a221b8>]lis_runq_sems [streams] 0x38 (0xf707bf30)
> >  [<f8a221b4>]lis_runq_sems [streams] 0x34 (0xf707bf3C)
> >  [<f8a221ac>]lis_runq_sems [streams] 0x2c (0xf707bf40)
> >  [<f8a221b4>]lis_runq_sems [streams] 0x34 (0xf707bf4c)
> >  [<c010adb2>]__down_interruptible [kernel] 0xd2 (0xf707bf50)
> >  [<f8a03640>]lis_runq_active_flags [streams] 0x0 (0xf707bf70)
> >  [<f8a22180>]lis_runq_sems [streams] 0x0 (0xf707bf7c)
> >  [<f89e58d5>]lis_runq_queues [streams] 0x41 (0xf707bf80)
> >  [<c010ae37>]__down_failed_interruptible [kernel] 0x7 (0xf707bf88)
> >  [<f8a221ac>]lis_runq_sems [streams] 0x2c (0xf707bf8c)
> >  [<f8a22180>]lis_runq_sems [streams] 0x0 (0xf707bf90)
> >  [<f89e168d>]lis_thread_runqueues [streams] 0x85 (0xf707bfa0)
> >  [<f89efb00>].rodata.str1.32[streams] 0x4300 (0xf707bfa8)
> >  [<c013466f>]free_uid [kernel] 0x1f (0xf707bfb4)
> >  [<f89e1608>]lis_thread_runqueues [streams] 0x0 (0xf707bfc8)
> >  [<f89e1558>]lis_thread_func [streams] 0x58 (0xf707bfd0)
> >  [<f89efb00>].rodata.str1.32[streams] 0x4300 (0xf707bfd8)
> >  [<f89e1500>]lis_thread_func [streams] 0x0 (0xf707bfe4)
> >  [<c010958d>]kernel_thread_helper [kernel] 0x5 (0xf707bff0)
> >
> >In order to run our Adax card, the ATMii-PCI, LiS or Linux Streams is
> >required. Linux Streams is the interface that links the RedHat OS to
> >the Adax ATMii card. We have been working diligently with the LiS
> >developers on this issue regarding the v60x all last week.
> >
> >As of this morning, the LiS Developers believe that this messages have
> >something to do the SunFire v60x hardware.
> >
> >"Your case, on the surface, looks like spin locks are not working on your
> >system.  The message from LiS is an assertion failure that should never
> >print out in the absence of contention for a queue head which is otherwise
> >protected by a spin lock.  I have never seen the messages that you are
> >seeing.
> >
> >Is there something about your machine (caching? hardware locking? memory
> >access sequencing) that would make the Linux implementation of spin locks
> >fail?  My gut feel is that you are looking for something very near the
> >hardware here. Take a careful walk through your machines setup menus
> >to see if there is some BIOS option that might affect multiple requestors
> >to memory."
> >
> >I have reviewed the BIOS on the SunFire, and this BIOS does not allow the
> >user to access a caching or hardware locking option. This is why we need
> >your assistance. Can you assist us in the low level troubleshooting of the
> >SunFire v60x? Are there steps we can take in the BIOS to access hardware
> >locking or memory access sequencing? Our concern is that on this same
> >system Solaris X86 version 2.9 runs fine with the Hyperthreading ENABLED.
> >What could be the difference between RedHat Enterprise 3.0 and X86 when it
> >comes to handling multiple CPUs?
> >
> >Thank you,
> >
> >Richard Barry-Smith
> >Network Support Engineer
> >Adax, Inc.
> >TEL: (510) 548-7047 x161
> >FAX: (510) 548-5526
> >
> >
> >
> >
> >
> >
> >
> 
> 
> 
> On Fri, 13 Feb 2004, Dave Grothe wrote:
> 
> > I've been running my development version of LiS-2.17 on a 2 CPU XEON system
> > which also appears as 4 CPUs to Linux.  I have been making improvements to
> > queue scheduling for performance enhancements and have only experienced one
> > generic type of problem (previously reported and soon to be fixed).
> >
> > The problem that I see is related to a STREAMS driver calling a kernel
> > function that, in turn, calls schedule().  The LiS queue scheduler is
> > holding a spin lock on the queue when it calls the service procedure, and
> > the call to schedule() can switch to some other process that also wants to
> > use that queue, and which then proceeds to spin on the lock.  If there are
> > more such contenders than CPUs you can end up with all CPUs spinning on the
> > lock and the thread that would release the lock sitting in the schedule queue.
> >
> > I am in the process of fixing that one by using a semaphore rather than a
> > spin lock to single thread entries to service procedures.
> >
> > In your case it sounds like a little bit of KGDB would go a long way
> > towards figuring it out.
> >
> > -- Dave
> >
> > At 08:19 PM 2/12/2004, Matthew Gierlach wrote:
> >
> > >Hello Dave:
> > >
> > >         This suggested SMP patch does not appear to provide help to
> > >         our SMP problem. After we got thorugh the new major/minor
> > >         clone driver issues we started the driver running and we
> > >         see the: Qhead / Qtail assertion messages. Shortly thereafter
> > >         the system (RH EL 3.0) PANICs. It does not write the PANIC info
> > >         into /var/log/messages and the screen can not be back scrolled
> > >         to see the chain of events that preciptate the PANIC.
> > >
> > >         We're running on a SUN branded IA P4 architecture (SunFire v60x) with
> > >         hyperthreading enabled. This means Linux will see 4 CPUs, and top
> > >         displays 4 CPUs.
> > >
> > >         When we disable hyperthreading and Linux only sees 2 CPUs, the driver
> > >         and LiS run without incident.
> > >
> > >         Is there any tracing or debugging I can provide to help further
> > >         diagnose the root cause?
> > >
> > >         Thanks, Matt
> > >
> > >On Tue, 10 Feb 2004, Dave Grothe wrote:
> > >
> > > > I have been testing on a 4 CPU IBM x335 running Red Hat 9.  I don't have a
> > > > copy of RH EL to test with.
> > > >
> > > > I found a problem having to do with assignment of queue runners to
> > > > CPUs.  The following patch takes care of that problem.  You might try it to
> > > > see if it helps.
> > > >
> > > > The message that you saw was essentially an assertion failure.  I am not
> > > > sure that there is any way for that condition to occur unless RH EL has
> > > > busted spin lock code.  LiS might recover from the assertion failure better
> > > > by returning from the function instead of proceeding.
> > > >
> > > > -- Dave
> > > >
> > > > Version diff for linux-mdep.c, version 2.123
> > > > --- /tmp/sccsdiff.25102/linux-mdep.c    2004-02-10 09:41:25.000000000 -0600
> > > > +++ /rsys/linux/LiS-2.17/head/linux-mdep.c      2004-02-09
> > > > 16:24:55.000000000 -0600
> > > > @@ -402,11 +402,11 @@
> > > >
> > > >
> > > >   #if defined(CONFIG_DEV)
> > > > -#define        FLF     , const char *file, int line, const char *fn
> > > > -#define FLFV   FLF
> > > > +#define FLFV   const char *file, int line, const char *fn
> > > > +#define        FLF     , FLFV
> > > >   #else
> > > > -#define FLF    /* nothing */
> > > >   #define FLFV   void
> > > > +#define FLF    /* nothing */
> > > >   #endif
> > > >
> > > >   static
> > > > @@ -3847,15 +3847,7 @@
> > > >       current->policy = SCHED_FIFO ;     /* real-time: run when ready */
> > > >       current->rt_priority = 50 ;                /* middle value real-time
> > > > priority */
> > > >       sigaddset(&MY_BLKS, SIGTERM) ;     /* inhibit SIGTERM */
> > > > -#if defined(KERNEL_2_5)
> > > > -# if defined(__SMP__)
> > > >       set_cpus_allowed(current, 1 << cpu_id) ;
> > > > -# else
> > > > -    /* of course this symbol is not defined unless the kernel was built
> > > > w/SMP */
> > > > -# endif
> > > > -#elif !defined(_PPC_LIS_)
> > > > -    current->cpus_allowed = (1 << cpu_id) ;    /* bind to a CPU */
> > > > -#endif
> > > >
> > > >   #if defined(KERNEL_2_5)
> > > >       yield() ;                          /* reschedule our thread */
> > > > @@ -3900,8 +3892,9 @@
> > > >              static int  msg_cnt ;
> > > >
> > > >              if (++msg_cnt < 5)
> > > > -               printk("%s woke up running on CPU %d\n",
> > > > -                       current->comm, smp_processor_id()) ;
> > > > +               printk("%s woke up running on CPU %d -- cpu_id=%d
> > > mask=0x%x\n",
> > > > +                       current->comm, smp_processor_id(), cpu_id,
> > > > +                       current->cpus_allowed) ;
> > > >          }
> > > >          /*
> > > >           * If there are characters queued up in need of printing, print
> > > > them if
> > > >
> > > > At 07:41 PM 2/9/2004, Matthew Gierlach wrote:
> > > >
> > > > >Hi Dave:
> > > > >
> > > > >         After the
> > > > >
> > > > >           LiS:qenable before Qhead error:lis_qhead=c941bb80 lis_qtail=0.
> > > > >
> > > > >         there is a kernel panic (yep, panic. RH EL PANICs instead of
> > > > > Oopsing).
> > > > >
> > > > >         Matt
> > > > >
> > > > >On Mon, 9 Feb 2004, Matthew Gierlach wrote:
> > > > >
> > > > > > Hi Dave:
> > > > > >
> > > > > >       We're performing some testing of RedHat Enterprise Linux AS 3.0
> > > > > >       and LiS is failing. We're testing on SUN repackaged Intel
> > > Hardware
> > > > > >       (SunFire v60x) that appears to Linux as four CPUs: two chips with
> > > > > >       two Xeon processors inside each chip.
> > > > > >
> > > > > >       The LiS symptom is:
> > > > > >
> > > > > >          LiS:qenable before Qhead error:lis_qhead=c941bb80 lis_qtail=0.
> > > > > >
> > > > > >       This occurs when all 4 CPUs are enabled and does not occur when
> > > > > >       only two CPUs are enabled. When hyperthreading in the BIOS is
> > > > > diabled,
> > > > > >       this message is not issued by LiS. Also "noapic" has been set
> > > in the
> > > > > >       vmlinuz image.
> > > > > >
> > > > > >       We see the same messages with both RH EL WS 3.0 and RH EL AS
> > > 3.0 with
> > > > > >       4 CPUs enabled. We thought that compiling LiS on WS did not work
> > > > > because
> > > > > >       WS only supports 2 CPUs and was not providing LiS suppport to
> > > handle
> > > > > >       the 3rd and 4th CPUs gracefully. So we compiled LiS on AS (which
> > > > > >       supports up to 16 CPUs) expecting LiS to inherit the >2 SMP
> > > support
> > > > > >       from the AS, and that does not appear to have occured.
> > > > > >
> > > > > >       Should LiS be compatible with > 2 CPU SMP environments?
> > > > > >
> > > > > >       Thanks, Matt Gierlach
> > > > > >
> > > > > >
> > > > > > WS Enterprise 3.0 SMP Kernel with Hyperthreading Enabled in BIOS;
> > > > > >
> > > > > > the system (SunFire v60x) will panic with a
> > > > > >
> > > > > > LiS:qenable before Qhead error:lis_qhead=c941bb80 lis_qtail=0.
> > > > > >
> > > > > > This happens on both WS and AS versions of RedHat Linux Enterprise 3.0.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >---
> > > > >Incoming mail is certified Virus Free.
> > > > >Checked by AVG anti-virus system (http://www.grisoft.com).
> > > > >Version: 6.0.577 / Virus Database: 366 - Release Date: 2/3/2004
> > > >
> > > >
> > >
> > >
> > >---
> > >Incoming mail is certified Virus Free.
> > >Checked by AVG anti-virus system (http://www.grisoft.com).
> > >Version: 6.0.587 / Virus Database: 371 - Release Date: 2/12/2004
> >
> >
> 
> _______________________________________________
> Linux-streams mailing list
> [EMAIL PROTECTED]
> http://gsyc.escet.urjc.es/mailman/listinfo/linux-streams

-- 
Brian F. G. Bidulock    � The reasonable man adapts himself to the �
[EMAIL PROTECTED]    � world; the unreasonable one persists in  �
http://www.openss7.org/ � trying  to adapt the  world  to himself. �
                        � Therefore  all  progress  depends on the �
                        � unreasonable man. -- George Bernard Shaw �
_______________________________________________
Linux-streams mailing list
[EMAIL PROTECTED]
http://gsyc.escet.urjc.es/mailman/listinfo/linux-streams

Reply via email to