Hi Dave:
We received the following suggested change to lis_run_queues() in
LiS/head/stream.c from a SUN field support engineer in Japan. I've
asked for the explanation behind this change and do not have a
response or answer just yet.
We have made this change and it has run without incident for 60 hours
on the dual Xeon system with hyperthreading enabled. All 4 CPUs show
usage under top.
I am fascinated that a line of code previously exclusively reserved for
non-SMP environments has such dramatic effect and results in the SMP
environment.
Is this perhaps just a SUN anomoly? Since we're getting a lot of pressure
for support of this system, we've not attempted to run this change on
other UP and SMP systems. What's your advice about this change?
Thanks, Matt
---------- Forwarded message ----------
Date: Fri, 27 Feb 2004 17:05:28 +0900
From: Takuya Watanabe <[EMAIL PROTECTED]>
To: Mark Ma <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED],
[EMAIL PROTECTED], Richard Barry-Smith <[EMAIL PROTECTED]>,
Takayuki Nakajima <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
Subject: Re: SunFire V60x help (fwd)
Dear Mark,
Yesterday, I looked into LiS-2.16.18-1 src code,
and I changed /usr/net/Adax/LiS-2.16.18-1/LiS-2.16.18-1/head/stream.c as
follows.
( "-" means deleted)
void
lis_run_queues(int cpu)
{
extern int lis_runq_sched ; /* linux-mdep.c BH code */
while (lis_atomic_read(&lis_runq_req_cnt) > 0)
{
- #if !defined(__SMP__) /* only for single-threaded */
if (lis_atomic_read(&lis_queues_running)) /* recursion protection */
return;
- #endif
lis_atomic_inc(&lis_queues_running);
lis_atomic_inc(&lis_runq_active_flags[cpu]) ;
queuerun(cpu); /* really run the queues */
lis_atomic_dec(&lis_runq_active_flags[cpu]) ;
lis_atomic_dec(&lis_queues_running);
}
lis_runq_sched = 0 ; /* OK to V semaphore now */
}/*lis_run_queues*/
I am not sure this is right solution for AS3.0 on V60X, however
now atmiitest is working with AS 3.0 smp kernel with hyper threading
enabled on V60X......
Thank you and best regards./takuya
Mark Ma wrote:
>Phillip and Dennis,
>
>Per email with John, Adax is working with Sun Japan for a Fujitsu RNC-Sim
>design win. We are running into the following system issue which request
>Sun's valuable inputs. Your advice are highly appreciated.
>
>Gratefully,
>
>Mark Ma
>Regional Manager, APAC
>Adax Inc.
>[EMAIL PROTECTED]
>408 829 8202
>
>
>Date: Wed, 25 Feb 2004 01:21:12 -0800
>From: John Atchison <[EMAIL PROTECTED]>
>To: [EMAIL PROTECTED], [EMAIL PROTECTED]
>Subject: [Fwd: FW: FW: Red Hat Linux on SunFire v60x (fwd)]
>
>
>Mark,
>
>Please let me know if your questions have been answered. If not,
>please contact one of the gentlemen below:
>
>Phillip Pham, [EMAIL PROTECTED], 408-907-9546 : BIOS Engineer
>Dennis Tiu, [EMAIL PROTECTED] , 650 352 5081: REV Engineering, can
>provide Linux support
>
>Regards,
>
>John
>
>
>
>---------- Forwarded message ----------
>Date: Tue, 24 Feb 2004 11:52:54 -0800 (PST)
>From: Richard Barry-Smith <[EMAIL PROTECTED]>
>To: [EMAIL PROTECTED]
>Subject: SunFire V60x help
>
>
>Hello Ms. Yuen,
>
>I work with Michael Khoury for Adax, Inc. in their Integration Department.
>We have been troubleshooting the SunFire v60x, and been having difficuties
>getting RedHat Enterprise 3.0 AS to run when the Hyperthread Option in the
>BIOS has been set to Enable.
>
>Thus far, the system runs for approximately 20-30 minutes than it panics.
>With Hyperthreading Disabled in the BIOS, the Adax card
>works. In Uni-Processor mode, the Adax card runs fine too.
>
>Unfortunately, we have not been able to get a Linux Ooops
>from our testing, so we do not know the actual nature of the panic.
>This is what we have been able to access from the console:
>
> [<f8a221b8>]lis_runq_sems [streams] 0x38 (0xf707bf30)
> [<f8a221b4>]lis_runq_sems [streams] 0x34 (0xf707bf3C)
> [<f8a221ac>]lis_runq_sems [streams] 0x2c (0xf707bf40)
> [<f8a221b4>]lis_runq_sems [streams] 0x34 (0xf707bf4c)
> [<c010adb2>]__down_interruptible [kernel] 0xd2 (0xf707bf50)
> [<f8a03640>]lis_runq_active_flags [streams] 0x0 (0xf707bf70)
> [<f8a22180>]lis_runq_sems [streams] 0x0 (0xf707bf7c)
> [<f89e58d5>]lis_runq_queues [streams] 0x41 (0xf707bf80)
> [<c010ae37>]__down_failed_interruptible [kernel] 0x7 (0xf707bf88)
> [<f8a221ac>]lis_runq_sems [streams] 0x2c (0xf707bf8c)
> [<f8a22180>]lis_runq_sems [streams] 0x0 (0xf707bf90)
> [<f89e168d>]lis_thread_runqueues [streams] 0x85 (0xf707bfa0)
> [<f89efb00>].rodata.str1.32[streams] 0x4300 (0xf707bfa8)
> [<c013466f>]free_uid [kernel] 0x1f (0xf707bfb4)
> [<f89e1608>]lis_thread_runqueues [streams] 0x0 (0xf707bfc8)
> [<f89e1558>]lis_thread_func [streams] 0x58 (0xf707bfd0)
> [<f89efb00>].rodata.str1.32[streams] 0x4300 (0xf707bfd8)
> [<f89e1500>]lis_thread_func [streams] 0x0 (0xf707bfe4)
> [<c010958d>]kernel_thread_helper [kernel] 0x5 (0xf707bff0)
>
>In order to run our Adax card, the ATMii-PCI, LiS or Linux Streams is
>required. Linux Streams is the interface that links the RedHat OS to
>the Adax ATMii card. We have been working diligently with the LiS
>developers on this issue regarding the v60x all last week.
>
>As of this morning, the LiS Developers believe that this messages have
>something to do the SunFire v60x hardware.
>
>"Your case, on the surface, looks like spin locks are not working on your
>system. The message from LiS is an assertion failure that should never
>print out in the absence of contention for a queue head which is otherwise
>protected by a spin lock. I have never seen the messages that you are
>seeing.
>
>Is there something about your machine (caching? hardware locking? memory
>access sequencing) that would make the Linux implementation of spin locks
>fail? My gut feel is that you are looking for something very near the
>hardware here. Take a careful walk through your machines setup menus
>to see if there is some BIOS option that might affect multiple requestors
>to memory."
>
>I have reviewed the BIOS on the SunFire, and this BIOS does not allow the
>user to access a caching or hardware locking option. This is why we need
>your assistance. Can you assist us in the low level troubleshooting of the
>SunFire v60x? Are there steps we can take in the BIOS to access hardware
>locking or memory access sequencing? Our concern is that on this same
>system Solaris X86 version 2.9 runs fine with the Hyperthreading ENABLED.
>What could be the difference between RedHat Enterprise 3.0 and X86 when it
>comes to handling multiple CPUs?
>
>Thank you,
>
>Richard Barry-Smith
>Network Support Engineer
>Adax, Inc.
>TEL: (510) 548-7047 x161
>FAX: (510) 548-5526
>
>
>
>
>
>
>
On Fri, 13 Feb 2004, Dave Grothe wrote:
> I've been running my development version of LiS-2.17 on a 2 CPU XEON system
> which also appears as 4 CPUs to Linux. I have been making improvements to
> queue scheduling for performance enhancements and have only experienced one
> generic type of problem (previously reported and soon to be fixed).
>
> The problem that I see is related to a STREAMS driver calling a kernel
> function that, in turn, calls schedule(). The LiS queue scheduler is
> holding a spin lock on the queue when it calls the service procedure, and
> the call to schedule() can switch to some other process that also wants to
> use that queue, and which then proceeds to spin on the lock. If there are
> more such contenders than CPUs you can end up with all CPUs spinning on the
> lock and the thread that would release the lock sitting in the schedule queue.
>
> I am in the process of fixing that one by using a semaphore rather than a
> spin lock to single thread entries to service procedures.
>
> In your case it sounds like a little bit of KGDB would go a long way
> towards figuring it out.
>
> -- Dave
>
> At 08:19 PM 2/12/2004, Matthew Gierlach wrote:
>
> >Hello Dave:
> >
> > This suggested SMP patch does not appear to provide help to
> > our SMP problem. After we got thorugh the new major/minor
> > clone driver issues we started the driver running and we
> > see the: Qhead / Qtail assertion messages. Shortly thereafter
> > the system (RH EL 3.0) PANICs. It does not write the PANIC info
> > into /var/log/messages and the screen can not be back scrolled
> > to see the chain of events that preciptate the PANIC.
> >
> > We're running on a SUN branded IA P4 architecture (SunFire v60x) with
> > hyperthreading enabled. This means Linux will see 4 CPUs, and top
> > displays 4 CPUs.
> >
> > When we disable hyperthreading and Linux only sees 2 CPUs, the driver
> > and LiS run without incident.
> >
> > Is there any tracing or debugging I can provide to help further
> > diagnose the root cause?
> >
> > Thanks, Matt
> >
> >On Tue, 10 Feb 2004, Dave Grothe wrote:
> >
> > > I have been testing on a 4 CPU IBM x335 running Red Hat 9. I don't have a
> > > copy of RH EL to test with.
> > >
> > > I found a problem having to do with assignment of queue runners to
> > > CPUs. The following patch takes care of that problem. You might try it to
> > > see if it helps.
> > >
> > > The message that you saw was essentially an assertion failure. I am not
> > > sure that there is any way for that condition to occur unless RH EL has
> > > busted spin lock code. LiS might recover from the assertion failure better
> > > by returning from the function instead of proceeding.
> > >
> > > -- Dave
> > >
> > > Version diff for linux-mdep.c, version 2.123
> > > --- /tmp/sccsdiff.25102/linux-mdep.c 2004-02-10 09:41:25.000000000 -0600
> > > +++ /rsys/linux/LiS-2.17/head/linux-mdep.c 2004-02-09
> > > 16:24:55.000000000 -0600
> > > @@ -402,11 +402,11 @@
> > >
> > >
> > > #if defined(CONFIG_DEV)
> > > -#define FLF , const char *file, int line, const char *fn
> > > -#define FLFV FLF
> > > +#define FLFV const char *file, int line, const char *fn
> > > +#define FLF , FLFV
> > > #else
> > > -#define FLF /* nothing */
> > > #define FLFV void
> > > +#define FLF /* nothing */
> > > #endif
> > >
> > > static
> > > @@ -3847,15 +3847,7 @@
> > > current->policy = SCHED_FIFO ; /* real-time: run when ready */
> > > current->rt_priority = 50 ; /* middle value real-time
> > > priority */
> > > sigaddset(&MY_BLKS, SIGTERM) ; /* inhibit SIGTERM */
> > > -#if defined(KERNEL_2_5)
> > > -# if defined(__SMP__)
> > > set_cpus_allowed(current, 1 << cpu_id) ;
> > > -# else
> > > - /* of course this symbol is not defined unless the kernel was built
> > > w/SMP */
> > > -# endif
> > > -#elif !defined(_PPC_LIS_)
> > > - current->cpus_allowed = (1 << cpu_id) ; /* bind to a CPU */
> > > -#endif
> > >
> > > #if defined(KERNEL_2_5)
> > > yield() ; /* reschedule our thread */
> > > @@ -3900,8 +3892,9 @@
> > > static int msg_cnt ;
> > >
> > > if (++msg_cnt < 5)
> > > - printk("%s woke up running on CPU %d\n",
> > > - current->comm, smp_processor_id()) ;
> > > + printk("%s woke up running on CPU %d -- cpu_id=%d
> > mask=0x%x\n",
> > > + current->comm, smp_processor_id(), cpu_id,
> > > + current->cpus_allowed) ;
> > > }
> > > /*
> > > * If there are characters queued up in need of printing, print
> > > them if
> > >
> > > At 07:41 PM 2/9/2004, Matthew Gierlach wrote:
> > >
> > > >Hi Dave:
> > > >
> > > > After the
> > > >
> > > > LiS:qenable before Qhead error:lis_qhead=c941bb80 lis_qtail=0.
> > > >
> > > > there is a kernel panic (yep, panic. RH EL PANICs instead of
> > > > Oopsing).
> > > >
> > > > Matt
> > > >
> > > >On Mon, 9 Feb 2004, Matthew Gierlach wrote:
> > > >
> > > > > Hi Dave:
> > > > >
> > > > > We're performing some testing of RedHat Enterprise Linux AS 3.0
> > > > > and LiS is failing. We're testing on SUN repackaged Intel
> > Hardware
> > > > > (SunFire v60x) that appears to Linux as four CPUs: two chips with
> > > > > two Xeon processors inside each chip.
> > > > >
> > > > > The LiS symptom is:
> > > > >
> > > > > LiS:qenable before Qhead error:lis_qhead=c941bb80 lis_qtail=0.
> > > > >
> > > > > This occurs when all 4 CPUs are enabled and does not occur when
> > > > > only two CPUs are enabled. When hyperthreading in the BIOS is
> > > > diabled,
> > > > > this message is not issued by LiS. Also "noapic" has been set
> > in the
> > > > > vmlinuz image.
> > > > >
> > > > > We see the same messages with both RH EL WS 3.0 and RH EL AS
> > 3.0 with
> > > > > 4 CPUs enabled. We thought that compiling LiS on WS did not work
> > > > because
> > > > > WS only supports 2 CPUs and was not providing LiS suppport to
> > handle
> > > > > the 3rd and 4th CPUs gracefully. So we compiled LiS on AS (which
> > > > > supports up to 16 CPUs) expecting LiS to inherit the >2 SMP
> > support
> > > > > from the AS, and that does not appear to have occured.
> > > > >
> > > > > Should LiS be compatible with > 2 CPU SMP environments?
> > > > >
> > > > > Thanks, Matt Gierlach
> > > > >
> > > > >
> > > > > WS Enterprise 3.0 SMP Kernel with Hyperthreading Enabled in BIOS;
> > > > >
> > > > > the system (SunFire v60x) will panic with a
> > > > >
> > > > > LiS:qenable before Qhead error:lis_qhead=c941bb80 lis_qtail=0.
> > > > >
> > > > > This happens on both WS and AS versions of RedHat Linux Enterprise 3.0.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >---
> > > >Incoming mail is certified Virus Free.
> > > >Checked by AVG anti-virus system (http://www.grisoft.com).
> > > >Version: 6.0.577 / Virus Database: 366 - Release Date: 2/3/2004
> > >
> > >
> >
> >
> >---
> >Incoming mail is certified Virus Free.
> >Checked by AVG anti-virus system (http://www.grisoft.com).
> >Version: 6.0.587 / Virus Database: 371 - Release Date: 2/12/2004
>
>
_______________________________________________
Linux-streams mailing list
[EMAIL PROTECTED]
http://gsyc.escet.urjc.es/mailman/listinfo/linux-streams