Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!

2007-03-29 Thread thomas schorpp

thomas schorpp wrote:

thomas schorpp wrote:

thomas schorpp wrote:

James Bottomley wrote:

On Sat, 2007-03-24 at 01:51 +0100, thomas schorpp wrote:

no. so the pci layer reports wrong start:

nonsense. it succeeds, confused function return with the error flag:

//  u_long  start;
//  u_long  start = 0xFFEFF000;
u_long  start = 0x3000;
int error;

struct resource* ret1;
error = 0;
//  start = pci_resource_start(ahc-dev_softc, 1);
if (start != 0) {
*bus_addr = start;
if ((ret1 = request_mem_region(start, 0x1000, 
aic7xxx)) == 0)


You can't do this.  The pci_resource_start is getting the address of
something called a Bus Address Register (BAR) it says in physical
address space where the card is responding ... you can't simply set 
that

to a random value.

The problem you seem to have is that your system is reporting a BAR
beyond 32 bits (4GB) which the card physically can't use.  This 
could be

because of a BIOS misconfiguration or because there's a bug in the PCI
subsystem somewhere.

James


understood. waiting for LKML answers... meanwhile i found harder 
reason for a possible bounds problem with the driver code on x86_64:


if i do:

static int
ahc_linux_pci_reserve_mem_region(struct ahc_softc *ahc,
u_long *bus_addr,
uint8_t __iomem **maddr)
{
//  u_long  start;
   uint32_t start;

i get no free warning of *nonexistant* resource (it cant be 
nonexistant, cause it was definitely something mapped):


tom1:/usr/src/linux# dmesg |grep -i free
Freeing unused kernel memory: 208k freed

with u_long type start i get it:
Mar 24 03:41:47 localhost kernel: Trying to free nonexistent resource 
f000-


investigating further...
-


hmm well i dont get the free warning cause
   
release_mem_region(ahc-platform_data-mem_busaddr,

  0x1000);
isnt called, the hack fails
   error = ahc_linux_pci_reserve_mem_region(ahc, base, maddr);
   if (error == 0) {

ok, so no bounds issue in the driver.



LKML people are ignoring my report, i take this as agreement to a mb 
bios issue.
will test the card with a latest debian kernel x86_64 netinstall cd on 
some other amd64 machine, but i need to find some in my reach here.

i need more confirmation before working in the linux pci hal.



no other amd64 machines in reach.

here's my fix. seems to be a h/w bug of the adaptec 19160 hba card, 
it is just faking 64bit BAR from the register read, doesn't care on i386 arch 
due to incomplete error handling ;) , but on x86_64 arch. since here and on 
LKML is no public interest in a real fix, I do no further investigation. 

Users, *DON'T try this at home, it may break real 64bit BAR cards* (if there're any for PCI32)! 


drivers/pci/probe.c
static void pci_read_bases(struct pci_dev *dev, unsigned int howmany, int rom)
{
[...]

   if ((l  (PCI_BASE_ADDRESS_SPACE | 
PCI_BASE_ADDRESS_MEM_TYPE_MASK))
   == (PCI_BASE_ADDRESS_SPACE_MEMORY | 
PCI_BASE_ADDRESS_MEM_TYPE_64)) {
   u32 szhi, lhi;
   pci_read_config_dword(dev, reg+4, lhi);
lhi = 0; //schorpp
   pci_write_config_dword(dev, reg+4, ~0);
   pci_read_config_dword(dev, reg+4, szhi);
   pci_write_config_dword(dev, reg+4, lhi); 
//kill the wrong read 0x0F
   szhi = pci_size(lhi, szhi, 0x);
   next++;
printk(KERN_ERR PCI: 64-bit check REG for device %s l %lx%lx sz %lx%lx start 
%llx end %llx flags $
   pci_name(dev), lhi, l, szhi, sz, res-start, res-end, res-flags);

#if BITS_PER_LONG == 64 //the cause, more checks for buggy h/w needed 
or platform dep. bug somewhere deeper
   res-start |= ((unsigned long) lhi)  32;
   res-end = res-start + sz;
printk(KERN_ERR PCI: 64-bit BAR check 1 for device %s l %lx%lx sz %lx%lx start 
%llx end %llx flag$
   pci_name(dev), lhi, l, szhi, sz, res-start, res-end, res-flags);
[...]

hba fine again:

tom1:/usr/src/linux# lspci -vvv -s 00:06.0
00:06.0 SCSI storage controller: Adaptec AIC-7892B U160/m (rev 02)
   Subsystem: Adaptec 19160 Ultra160 SCSI Controller
   Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- 
Stepping- SERR+ FastB2B-
   Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- 
MAbort- SERR- PERR-
   Latency: 32 (1ns min, 6250ns max), Cache Line Size: 64 bytes
   Interrupt: pin A routed to IRQ 17
   BIST result: 00
   Region 0: I/O ports at d800 [disabled] [size=256]
   Region 1: Memory at 3000 (64-bit, non-prefetchable) [size=4K]
   Expansion ROM at fbee [disabled] [size=128K]
   Capabilities: [dc] Power Management version 2
   Flags: PMEClk

Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!

2007-03-23 Thread thomas schorpp

James Bottomley wrote:

On Fri, 2007-03-23 at 02:26 +0100, thomas schorpp wrote:
ok, overriding the first while(ahc_is_paused) that blocked before 
(i see no sense for doing this in a pci mmap test function, cause 
proper resource setup is required *before* using such I/O functions, 
otherwise the adapter had entered SEQ paused status)

i got the kernel to boot at least at pio mode.

this is surely not the correct resource and looks like a datatype 
boundary overflow, the upper 0x0f is missing:

[   49.278810] Trying to free nonexistent resource
f000-fff
f


That's because ahc-platform_data-mem_busaddr is u32



[   54.513224] scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
[   54.513226] Adaptec 19160B Ultra160 SCSI adapter
[   54.513227] aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs


The driver code suggests that the 7892 can't do the AHC_LARGE_SCBS
features ... which means the card itself cannot address more than 32
bits of memory, so it would be unable to decode a BAR that's beyond the
32 bit range.  So this looks like some type of error in the PCI config
system (or possibly in the BIOS).  I think this card needs its BARs to
be in the lower 32 bits to function.

James




i agree for this to be a 32bit dma busmaster chip,
since pci_resource_flags and lspci say 64bit mem resource type

aic7xxx: pci_resource_start f000 *maddr 2 mem64 4

we've a bug in the x86_64 linux pci config, BIOS is ok, 
the hardware worked fine in a winxp_x64 test setup a few months ago.


will ask LKML.

y
tom
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!

2007-03-23 Thread thomas schorpp

thomas schorpp wrote:

James Bottomley wrote:

On Fri, 2007-03-23 at 02:26 +0100, thomas schorpp wrote:
ok, overriding the first while(ahc_is_paused) that blocked before (i 
see no sense for doing this in a pci mmap test function, cause proper 
resource setup is required *before* using such I/O functions, 
otherwise the adapter had entered SEQ paused status)

i got the kernel to boot at least at pio mode.

this is surely not the correct resource and looks like a datatype 
boundary overflow, the upper 0x0f is missing:

[   49.278810] Trying to free nonexistent resource
f000-fff
f


That's because ahc-platform_data-mem_busaddr is u32


[   54.513224] scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, 
Rev 7.0

[   54.513226] Adaptec 19160B Ultra160 SCSI adapter
[   54.513227] aic7892: Ultra160 Wide Channel A, SCSI Id=7, 
32/253 SCBs


The driver code suggests that the 7892 can't do the AHC_LARGE_SCBS
features ... which means the card itself cannot address more than 32
bits of memory, so it would be unable to decode a BAR that's beyond the
32 bit range.  So this looks like some type of error in the PCI config
system (or possibly in the BIOS).  I think this card needs its BARs to
be in the lower 32 bits to function.

James




i agree for this to be a 32bit dma busmaster chip,
since pci_resource_flags and lspci say 64bit mem resource type

aic7xxx: pci_resource_start f000 *maddr 2 mem64 4

we've a bug in the x86_64 linux pci config, BIOS is ok, the hardware 
worked fine in a winxp_x64 test setup a few months ago.


will ask LKML.

y
tom


sorry, wrong according to http://download.adaptec.com/pdfs/aic7892.pdf.

66 MHz, 64-bit, PCI interface that
supports zero wait-state memory;
also operates on 33 MHz, 32-bit
PCI busses

this chip is capable of 64bit addressing, as pci_resource_ (checking this) on x86_64 platform 
and lspci on x86_64 *and* AMDK7 configured kernels reports, even on PCI/32, right?

or is it impossible to do multiplexed 64bit mem addressing on PCI/32?

Why are the driver structure address members 32bits wide types if therere PCI/64 card 
models with this chips as listet in aic7xxx.txt kernel doc and stated in aic7892.pdf?

I'll adapt the respective driver structures and function args now to 64bit and 
see what happens...

can adaptec.inc pls comment? since the aha19160 card is still in production state, 
i assume they want to have a linux x86_64 dma capable driver. so far it is not, 
or can other users having this card pls confirm my pci system broken?


y
tom

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][DOC]aic7xxx: Correct wrong Kernel Documentation for Adaptec AHA19160(B) HBA

2007-03-23 Thread thomas schorpp

--- Documentation/scsi/aic7xxx.txt 2007-03-23 16:44:05.0 +0100
+++ Documentation/scsi/aic7xxx.txt 2007-03-23 17:01:19.0 +0100
@@ -28,7 +28,7 @@
   aic7880  10PCI/3220MHz16Bit  16
   aic7890  20PCI/3240MHz16Bit  16  3 4 5 6 7 8
   aic7891  20PCI/6440MHz16Bit  16  3 4 5 6 7 8
-   aic7892  20   PCI/64-66  80MHz16Bit  16  3 4 5 6 7 8
+   aic7892  20 PCI/32/64-66 80MHz16Bit  16  3 4 5 6 7 8
   aic7895  15PCI/3220MHz16Bit  162 3 4 5
   aic7895C 15PCI/3220MHz16Bit  162 3 4 5 8
   aic7896  20PCI/3240MHz16Bit  162 3 4 5 6 7 8
@@ -114,7 +114,7 @@
   AHA-29160N aic7892   PCI/32  LVD-HD68F   SE-HD50F
 SE-50M
   AHA-29160LPaic7892   PCI/64-66
-   AHA-19160  aic7892   PCI/64-66
+   AHA-19160  aic7892   PCI/32
   AHA-29150LPaic7892   PCI/64-66
   AHA-29130LPaic7892   PCI/64-66
   AHA-3960D  aic7899   PCI/64-66  2 X LVD-HD68F  2 X LVD-VHD68F

Correct documentation according to the aic7892 chip, adaptec aha19160 hba specs.

Signed-off-by tom schorpp [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!

2007-03-23 Thread thomas schorpp

James Bottomley wrote:

On Fri, 2007-03-23 at 17:28 +0100, thomas schorpp wrote:

i agree for this to be a 32bit dma busmaster chip,
since pci_resource_flags and lspci say 64bit mem resource type

aic7xxx: pci_resource_start f000 *maddr 2 mem64 4



static int
ahc_linux_pci_reserve_mem_region(struct ahc_softc *ahc,
u_long *bus_addr,
uint8_t __iomem **maddr)
{
//  u_long  start;
   u_long  len;
   int error;
   uint64_t start;
...
   printk(KERN_WARNING aic7xxx: pci_resource_start 0x%llx mem64 0x%lx\n, start, 
pci_resource_flags(ahc-dev_softc, 1)  PCI_BASE_ADDRESS_MEM_TYPE_64 ); //schorpp
   return (error);

aic7xxx: pci_resource_start 0xff000 mem64 0x4
---^

just to doublecheck the situation, posted lspci already.

will check next, if 


  len = pci_resource_len(ahc-dev_softc, 1);
   if (start != 0) {
   *bus_addr = start;
//  if (request_mem_region(start, 0x1000, aic7xxx) == 0)
   if (request_mem_region(start, len, aic7xxx) == 0)

succeeds.

we've a bug in the x86_64 linux pci config, BIOS is ok, the hardware 
worked fine in a winxp_x64 test setup a few months ago.


will ask LKML.

y
tom

sorry, wrong according to http://download.adaptec.com/pdfs/aic7892.pdf.

66 MHz, 64-bit, PCI interface that
supports zero wait-state memory;
also operates on 33 MHz, 32-bit
PCI busses

this chip is capable of 64bit addressing, as pci_resource_ (checking this) on x86_64 platform 
and lspci on x86_64 *and* AMDK7 configured kernels reports, even on PCI/32, right?

or is it impossible to do multiplexed 64bit mem addressing on PCI/32?


It can only do 37 bit addressing ... only the aic79xx can do the full 64
bits, so I suspect it should never get a 64 bit BAR, since it wouldn't
be able to decode the full 32 bits.  I can fix the mmio check not to
hang, but the card won't actually work mmio until whatever's assigning
the BAR above 32 bits is fixed (that could either be a kernel PCI bug or
a BIOS bug).



ok, i trust in that. adaptor bios and mainboard bios *are* out, winxp_x64 
driver handled all.
so agree on kernel pci hal issue.
but what for const uint64_t   mask_39bit = 0x7FULL;
then?



can adaptec.inc pls comment? since the aha19160 card is still in production state, 
i assume they want to have a linux x86_64 dma capable driver. so far it is not, 
or can other users having this card pls confirm my pci system broken?


James




y
tom

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!

2007-03-23 Thread thomas schorpp

thomas schorpp wrote:

thomas schorpp wrote:

James Bottomley wrote:

On Fri, 2007-03-23 at 17:28 +0100, thomas schorpp wrote:

i agree for this to be a 32bit dma busmaster chip,
since pci_resource_flags and lspci say 64bit mem resource type

aic7xxx: pci_resource_start f000 *maddr 2 mem64 4



static int
ahc_linux_pci_reserve_mem_region(struct ahc_softc *ahc,
u_long *bus_addr,
uint8_t __iomem **maddr)
{
//  u_long  start;
   u_long  len;
   int error;
   uint64_t start;
...
   printk(KERN_WARNING aic7xxx: pci_resource_start 0x%llx mem64 
0x%lx\n, start, pci_resource_flags(ahc-dev_softc, 1)  
PCI_BASE_ADDRESS_MEM_TYPE_64 ); //schorpp

   return (error);

aic7xxx: pci_resource_start 0xff000 mem64 0x4
---^

just to doublecheck the situation, posted lspci already.

will check next, if
  len = pci_resource_len(ahc-dev_softc, 1);
   if (start != 0) {
   *bus_addr = start;
//  if (request_mem_region(start, 0x1000, aic7xxx) == 0)
   if (request_mem_region(start, len, aic7xxx) == 0)

succeeds.


no. so the pci layer reports wrong start:


nonsense. it succeeds, confused function return with the error flag:

//  u_long  start;
//  u_long  start = 0xFFEFF000;
   u_long  start = 0x3000;
   int error;

   struct resource* ret1;
   error = 0;
//  start = pci_resource_start(ahc-dev_softc, 1);
   if (start != 0) {
   *bus_addr = start;
   if ((ret1 = request_mem_region(start, 0x1000, aic7xxx)) == 0)
   error = ENOMEM;
   printk(KERN_WARNING aic7xxx: req_mem_region start 
0x%lx\n, \
   ret1-start); //schorpp
   if (error == 0) {
   *maddr = ioremap_nocache(start, 256);
   if (*maddr == NULL) {
   error = ENOMEM;
   release_mem_region(start, 0x1000);
   }
   }
   } else
   error = ENOMEM;

tom1:~# dmesg |grep aic
aic7xxx: DMA_32BIT_MASK
aic7xxx: req_mem_region start 0x3000
aic7xxx: pci_resource_start 0x3000 **maddr 0xff mem64 0x4
aic7xxx: PCI Device 0:6:0 failed memory mapped test.  Using PIO.
   aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

tried the mem start value from lspci on a running knoppix and winxp, 
but the if (ahc_pci_test_register_access(ahc) != 0) {

does not go.
thought the pci resources were constant and i could hardcode for my system ;)

remarkable is, with this mem start setting the kernel autofree(?) does not take 
action.





we've a bug in the x86_64 linux pci config, BIOS is ok, the 
hardware worked fine in a winxp_x64 test setup a few months ago.


will ask LKML.

y
tom

sorry, wrong according to http://download.adaptec.com/pdfs/aic7892.pdf.

66 MHz, 64-bit, PCI interface that
supports zero wait-state memory;
also operates on 33 MHz, 32-bit
PCI busses

this chip is capable of 64bit addressing, as pci_resource_ 
(checking this) on x86_64 platform and lspci on x86_64 *and* AMDK7 
configured kernels reports, even on PCI/32, right?

or is it impossible to do multiplexed 64bit mem addressing on PCI/32?


It can only do 37 bit addressing ... only the aic79xx can do the full 64
bits, so I suspect it should never get a 64 bit BAR, since it wouldn't
be able to decode the full 32 bits.  I can fix the mmio check not to
hang, but the card won't actually work mmio until whatever's assigning
the BAR above 32 bits is fixed (that could either be a kernel PCI bug or
a BIOS bug).



ok, i trust in that. adaptor bios and mainboard bios *are* out, 
winxp_x64 driver handled all.

so agree on kernel pci hal issue.
but what for const uint64_t   mask_39bit = 0x7FULL;
then?



can adaptec.inc pls comment? since the aha19160 card is still in 
production state, i assume they want to have a linux x86_64 dma 
capable driver. so far it is not, or can other users having this 
card pls confirm my pci system broken?


James




y
tom

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!

2007-03-23 Thread thomas schorpp

James Bottomley wrote:

On Sat, 2007-03-24 at 01:51 +0100, thomas schorpp wrote:

no. so the pci layer reports wrong start:

nonsense. it succeeds, confused function return with the error flag:

//  u_long  start;
//  u_long  start = 0xFFEFF000;
u_long  start = 0x3000;
int error;

struct resource* ret1;
error = 0;
//  start = pci_resource_start(ahc-dev_softc, 1);
if (start != 0) {
*bus_addr = start;
if ((ret1 = request_mem_region(start, 0x1000, aic7xxx)) == 0)


You can't do this.  The pci_resource_start is getting the address of
something called a Bus Address Register (BAR) it says in physical
address space where the card is responding ... you can't simply set that
to a random value.

The problem you seem to have is that your system is reporting a BAR
beyond 32 bits (4GB) which the card physically can't use.  This could be
because of a BIOS misconfiguration or because there's a bug in the PCI
subsystem somewhere.

James


understood. waiting for LKML answers... meanwhile i found harder reason for 
a possible bounds problem with the driver code on x86_64:


if i do:

static int
ahc_linux_pci_reserve_mem_region(struct ahc_softc *ahc,
u_long *bus_addr,
uint8_t __iomem **maddr)
{
//  u_long  start;
   uint32_t start;

i get no free warning of *nonexistant* resource (it cant be nonexistant, 
cause it was definitely something mapped):


tom1:/usr/src/linux# dmesg |grep -i free
Freeing unused kernel memory: 208k freed

with u_long type start i get it:
Mar 24 03:41:47 localhost kernel: Trying to free nonexistent resource 
f000-


investigating further...
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!

2007-03-23 Thread thomas schorpp

thomas schorpp wrote:

James Bottomley wrote:

On Sat, 2007-03-24 at 01:51 +0100, thomas schorpp wrote:

no. so the pci layer reports wrong start:

nonsense. it succeeds, confused function return with the error flag:

//  u_long  start;
//  u_long  start = 0xFFEFF000;
u_long  start = 0x3000;
int error;

struct resource* ret1;
error = 0;
//  start = pci_resource_start(ahc-dev_softc, 1);
if (start != 0) {
*bus_addr = start;
if ((ret1 = request_mem_region(start, 0x1000, 
aic7xxx)) == 0)


You can't do this.  The pci_resource_start is getting the address of
something called a Bus Address Register (BAR) it says in physical
address space where the card is responding ... you can't simply set that
to a random value.

The problem you seem to have is that your system is reporting a BAR
beyond 32 bits (4GB) which the card physically can't use.  This could be
because of a BIOS misconfiguration or because there's a bug in the PCI
subsystem somewhere.

James


understood. waiting for LKML answers... meanwhile i found harder reason 
for a possible bounds problem with the driver code on x86_64:


if i do:

static int
ahc_linux_pci_reserve_mem_region(struct ahc_softc *ahc,
u_long *bus_addr,
uint8_t __iomem **maddr)
{
//  u_long  start;
   uint32_t start;

i get no free warning of *nonexistant* resource (it cant be 
nonexistant, cause it was definitely something mapped):


tom1:/usr/src/linux# dmesg |grep -i free
Freeing unused kernel memory: 208k freed

with u_long type start i get it:
Mar 24 03:41:47 localhost kernel: Trying to free nonexistent resource 
f000-


investigating further...
-


hmm well i dont get the free warning cause 


   release_mem_region(ahc-platform_data-mem_busaddr,
  0x1000);
isnt called, the hack fails 


   error = ahc_linux_pci_reserve_mem_region(ahc, base, maddr);
   if (error == 0) {

ok, so no bounds issue in the driver.


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!

2007-03-22 Thread thomas schorpp

lo,

well, ive several live cd systems  2.6.19.5i386 that oops and hang boot in 
aic7xxx init,
only one booting here is knoppix 5.2,

the latest unofficial debian stable 2.6.8-12-amd64-generic, which 
says ACPI: PCI interrupt :00:06.0[A] - GSI 17 (level, low) - IRQ 17

aic7xxx: PCI0:6:0 MEM region 0x0 unavailable. Cannot memory map device.
but works ok,

a debian etch 2.6.18-4-amd64 which says:

SCSI subsystem initialized
GSI 16 sharing vector 0xA9 and IRQ 16
ACPI: PCI Interrupt :00:06.0[A] - GSI 17 (level, low) - IRQ 169
BUG: soft lockup detected on CPU#0!

Call Trace:
IRQ [802a3fec] softlockup_tick+0xdb/0xed
[802881df] update_process_times+0x42/0x68
[8026cbd8] smp_local_timer_interrupt+0x23/0x47
[8026d2cc] smp_apic_timer_interrupt+0x41/0x47
[8025904a] apic_timer_interrupt+0x66/0x6c
EOI [8038a412] pci_conf1_write+0x0/0xc9
[88053718] :aic7xxx:ahc_pci_test_register_access+0xc2/0x391
[880536a5] :aic7xxx:ahc_pci_test_register_access+0x4f/0x391
[88059416] :aic7xxx:ahc_pci_map_registers+0x1bb/0x239
[880523d2] :aic7xxx:ahc_pci_config+0x4c/0x12d0
[80389fb7] pcibios_set_master+0x1e/0x84
[88059186] :aic7xxx:ahc_linux_pci_dev_probe+0x13e/0x213
[80317eea] pci_device_probe+0xdf/0x147
[8036b9db] driver_probe_device+0x52/0xa8
[8036ba96] __driver_attach+0x0/0x9a
[8036bae6] __driver_attach+0x50/0x9a
[8036ba96] __driver_attach+0x0/0x9a
[8036b458] bus_for_each_dev+0x43/0x6e
[8036b09a] bus_add_driver+0x7e/0x130
[803180c4] __pci_register_driver+0x57/0x7d
[8805903e] :aic7xxx:ahc_linux_pci_init+0x17/0x21
[8806e325] :aic7xxx:ahc_linux_init+0x325/0x336
[8027d27d] default_wake_function+0x0/0xe
[8025e2e5] __down_read+0x12/0x9a
[80294fa1] __link_module+0x0/0x25
[802200e5] __up_read+0x13/0x8a
[80297695] sys_init_module+0x16cc/0x1882
[802584d6] system_call+0x7e/0x83

BUG: soft lockup detected on CPU#0!

a kernel.org 2.6.20 with K8 config set but built in a 32Bit debian sid environment, 
but works ok,


and finally the latest kernel.org 2.6.20.3 AMD K8 built on debian amd64 etch userland that 
hangs boot on aic7xxx init without magic sysreq keys functionality:

Loading iSCSI transport class v2.0-724.
ACPI: PCI Interrupt :00:06.0[A] - GSI 17 (level, low) - IRQ 17
... Kernel alive - Kernel direct mapping tables up to 1 @ 8000-d000

now trying latest scsi git and be on ##kernel at freenode if Q.

y
tom

SysRq : Resetting
Linux version 2.6.20.3amd64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 
(prerelease7
Command line: root=/dev/sda1 ro single console=ttyS0,115200n8 aic7xxx=debug=255
BIOS-provided physical RAM map:
BIOS-e820:  - 0009fc00 (usable)
BIOS-e820: 0009fc00 - 000a (reserved)
BIOS-e820: 000e4000 - 0010 (reserved)
BIOS-e820: 0010 - 1ffd (usable)
BIOS-e820: 1ffd - 1ffde000 (ACPI data)
BIOS-e820: 1ffde000 - 2000 (ACPI NVS)
BIOS-e820: fec0 - fec01000 (reserved)
BIOS-e820: ff78 - 0001 (reserved)
end_pfn_map = 1048576
DMI 2.3 present.
Zone PFN ranges:
 DMA 0 - 4096
 DMA324096 -  1048576
 Normal1048576 -  1048576
early_node_map[2] active PFN ranges
   0:0 -  159
   0:  256 -   131024
ACPI: PM-Timer IO Port: 0x808
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x81] disabled)
ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 1, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Nosave address range: 0009f000 - 000a
Nosave address range: 000a - 000e4000
Nosave address range: 000e4000 - 0010
Allocating PCI resources starting at 3000 (gap: 2000:dec0)
Built 1 zonelists.  Total pages: 127672
Kernel command line: root=/dev/sda1 ro single console=ttyS0,115200n8 aic7xxx=de5
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 16384 bytes)
time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
time.c: Detected 2000.164 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
Checking aperture...
CPU 0: aperture @ d000 size 128 MB
Memory: 509592k/524096k available (3711k kernel code, 13908k reserved, 1316k da)
Calibrating delay using timer specific routine.. 4005.05 BogoMIPS (lpj=8010104)
Security Framework v1.0.0 initialized
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 

Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!

2007-03-22 Thread thomas schorpp

no fix in scsi rc fixes git, now examining code from the softlockup trace 
before...

[0.00] Linux version 2.6.21-rc3amd64-gbb9ba31c ([EMAIL PROTECTED]) (gcc 
version
4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 PREEMPT Thu Mar 22 17:39:17 CE
T 2007
[0.00] Command line: root=/dev/sda1 ro single console=ttyS0,115200n8
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009fc00 (usable)
[0.00]  BIOS-e820: 0009fc00 - 000a (reserved)
[0.00]  BIOS-e820: 000e4000 - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 1ffd (usable)
[0.00]  BIOS-e820: 1ffd - 1ffde000 (ACPI data)
[0.00]  BIOS-e820: 1ffde000 - 2000 (ACPI NVS)
[0.00]  BIOS-e820: fec0 - fec01000 (reserved)
[0.00]  BIOS-e820: ff78 - 0001 (reserved)
[0.00] end_pfn_map = 1048576
[0.00] DMI 2.3 present.
[0.00] ACPI: RSDP 000F92B0, 0014 (r0 ACPIAM)
[0.00] ACPI: RSDT 1FFD, 0030 (r1 A M I  OEMRSDT  1612 MSFT
97)
[0.00] ACPI: FACP 1FFD0200, 0084 (r2 A M I  OEMFACP  1612 MSFT
97)
[0.00] ACPI: DSDT 1FFD03F0, 3D20 (r1  1 10055 INTL  2002
026)
[0.00] ACPI: FACS 1FFDE000, 0040
[0.00] ACPI: APIC 1FFD0390, 005C (r1 A M I  OEMAPIC  1612 MSFT
97)
[0.00] ACPI: OEMB 1FFDE040, 0046 (r1 A M I  AMI_OEM  1612 MSFT
97)
[0.00] Zone PFN ranges:
[0.00]   DMA 0 - 4096
[0.00]   DMA324096 -  1048576
[0.00]   Normal1048576 -  1048576
[0.00] early_node_map[2] active PFN ranges
[0.00] 0:0 -  159
[0.00] 0:  256 -   131024
[0.00] Looks like a VIA chipset. Disabling IOMMU. Override with iommu=al
lowed
[0.00] ACPI: PM-Timer IO Port: 0x808
[0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
[0.00] Processor #0 (Bootup-CPU)
[0.00] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x81] disabled)
[0.00] ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0])
[0.00] IOAPIC[0]: apic_id 1, address 0xfec0, GSI 0-23
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[0.00] Setting APIC routing to flat
[0.00] Using ACPI (MADT) for SMP configuration information
[0.00] Nosave address range: 0009f000 - 000a
[0.00] Nosave address range: 000a - 000e4000
[0.00] Nosave address range: 000e4000 - 0010
[0.00] Allocating PCI resources starting at 3000 (gap: 2000:dec0
)
[0.00] Built 1 zonelists.  Total pages: 126532
[0.00] Kernel command line: root=/dev/sda1 ro single console=ttyS0,11520
0n8
[0.00] Initializing CPU#0
[0.00] PID hash table entries: 2048 (order: 11, 16384 bytes)
[   40.851937] time.c: Detected 2000.089 MHz processor.
[   40.853406] Console: colour VGA+ 80x25
[   41.128559] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo
Molnar
[   41.136287] ... MAX_LOCKDEP_SUBCLASSES:8
[   41.140560] ... MAX_LOCK_DEPTH:  30
[   41.144747] ... MAX_LOCKDEP_KEYS:2048
[   41.149105] ... CLASSHASH_SIZE:   1024
[   41.153550] ... MAX_LOCKDEP_ENTRIES: 8192
[   41.157901] ... MAX_LOCKDEP_CHAINS:  16384
[   41.162346] ... CHAINHASH_SIZE:  8192
[   41.166704]  memory used by lock dependency info: 1648 kB
[   41.172093]  per task-struct memory footprint: 1680 bytes
[   41.177480] 
[   41.181041] | Locking API testsuite:
[   41.184615] -
---
[   41.192694]  | spin |wlock |rlock |mutex | ws
em | rsem |
[   41.200771]   ---
---
[   41.208855]  A-A deadlock:  ok  |  ok  |  ok  |  ok  |  o
k  |  ok  |
[   41.217928]  A-B-B-A deadlock:  ok  |  ok  |  ok  |  ok  |  o
k  |  ok  |
[   41.226939]  A-B-B-C-C-A deadlock:  ok  |  ok  |  ok  |  ok  |  o
k  |  ok  |
[   41.236020]  A-B-C-A-B-C deadlock:  ok  |  ok  |  ok  |  ok  |  o
k  |  ok  |
[   41.245093]  A-B-B-C-C-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  |  o
k  |  ok  |
[   41.254251]  A-B-C-D-B-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  |  o
k  |  ok  |
[   41.263393]  A-B-C-D-B-C-D-A deadlock:  ok  |  ok  |  ok  |  ok  |  o
k  |  ok  |
[   41.272560] double unlock:  ok  |  ok  |  ok  |  ok  |  o
k  |  ok  |
[   41.281529]   initialize held:  ok  |  ok  |  ok  |  ok  |  o
k  |  ok  |
[   41.290645]  bad unlock order:  ok  | 

Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!

2007-03-22 Thread thomas schorpp

[   48.848796] Loading iSCSI transport class v2.0-724.
[   48.854066] iscsi: registered transport (tcp)
[   48.858479] ahc_linux_pci_init
[   48.861676] ahc_linux_pci_dev_probe
[   48.865208] ACPI: PCI Interrupt :00:06.0[A] - GSI 17 (level, low) - IRQ
17
[   48.872628] ahc_pci_config
[   48.875335] set_power_state
[   48.878126] map_registers
[   48.880744] ahc_pci_map_registers enter
[   48.884571] .read_config
[   48.887106] .reserve_mem
[   48.889647] .write_config_iferr0
[   48.892871] .test_registers_iferr0
[   48.896265] ahc_pci_test_register_access enter
[   48.900699] .read_config
[   48.903235] .write_config_noserr
[   48.906462] .hcnctrl
[   48.908648] .hcntrl pause cmd
[   48.911616] .I will pause 4E4 if missing errh before :/

ok, as expected, the wait for pause ended loop, (someone with the specs pls say max HZ 
for a) wait_interruptible(_timeout)() here.
yes, I know that case must not happen and it should bug cause 
the pci config is messed up already, but generally such loops are 
surely inacceptable in such a kernel thread.


will dump the config data to here, maybe its readable to the aic devs. 

then disable test and will let it bug somewhere further where the 
cause can possibly be easier seen.

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!

2007-03-22 Thread thomas schorpp
ok, overriding the first while(ahc_is_paused) that blocked before 
(i see no sense for doing this in a pci mmap test function, cause 
proper resource setup is required *before* using such I/O functions, 
otherwise the adapter had entered SEQ paused status)

i got the kernel to boot at least at pio mode.

this is surely not the correct resource and looks like a datatype 
boundary overflow, the upper 0x0f is missing:

[   49.278810] Trying to free nonexistent resource f000-fff
f

-f000

tom1:~# lspci -vvv -s 00:06.0
00:06.0 SCSI storage controller: Adaptec AIC-7892B U160/m (rev 02)
   Subsystem: Adaptec 19160 Ultra160 SCSI Controller
   Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Step
ping- SERR+ FastB2B-
   Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort
- MAbort- SERR- PERR-
   Latency: 32 (1ns min, 6250ns max), Cache Line Size: 64 bytes
   Interrupt: pin A routed to IRQ 17
   BIST result: 00
   Region 0: I/O ports at d800 [size=256]
   Region 1: Memory at ff000 (64-bit, non-prefetchable) [disabled] [siz
e=4K]
   Expansion ROM at fbee [disabled] [size=128K]
   Capabilities: [dc] Power Management version 2
   Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot
-,D3cold-)
   Status: D0 PME-Enable- DSel=0 DScale=0 PME-

000f--f000

but theres a platform issue somewhere in the code, affecting 
x86_64. maintainers, pls have a look, too, thx.



[   49.181771] Loading iSCSI transport class v2.0-724.
[   49.187129] iscsi: registered transport (tcp)
[   49.191491] ahc_linux_pci_init
[   49.194682] ahc_linux_pci_dev_probe
[   49.198221] ACPI: PCI Interrupt :00:06.0[A] - GSI 17 (level, low) - IRQ
17
[   49.205636] ahc_pci_config
[   49.208337] set_power_state
[   49.211131] map_registers
[   49.213748] ahc_pci_map_registers enter
[   49.217574] .read_config
[   49.220110] .reserve_mem
[   49.222649] .write_config_iferr0
[   49.225869] .test_registers_iferr0
[   49.229267] ahc_pci_test_register_access enter
[   49.233704] .read_config 116
[   49.236584] .write_config_noserr
[   49.239810] .hcnctrl
[   49.241998] .paused 0
[   49.244362] .write_config
[   49.246982] .write_config
[   49.249614] .fail scb_base
[   49.252321] .ending fail err 5
[   49.255368] .read_config
[   49.257901] .write_config
[   49.260523] .clrint
[   49.262622] .seqctl
[   49.264720] .write_config
[   49.267340] ahc_pci_test_register_access leave
[   49.271775] aic7xxx: PCI Device 0:6:0 failed memory mapped test.  Using PIO.
[   49.278810] Trying to free nonexistent resource f000-fff
f
[   49.286443] .reserve_io
[   49.288894] reserve_io_ok
[   49.291510] .write_config
[   49.294129] map_registers leave
[   49.297265] read_config
[   49.299710] write_config1
[   49.302330] write_config2
[   49.304951] softc_init
[   49.307329] ahc_reset
[   49.322446] ahc_init_core
[   49.325301] ahc_pci:0:6:0: hardware scb 64 bytes; kernel scb 104 bytes; ahc_d
ma 8 bytes
[   49.519047] ENINT
[   54.513224] scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
[   54.513226] Adaptec 19160B Ultra160 SCSI adapter
[   54.513227] aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
[   54.513228]
[   54.534992] Adaptec aacraid driver (1.1-5[2423]-mh3)
[   54.540128] st: Version 20070203, fixed bufsize 32768, s/g segs 256
[   54.546541] osst :I: Tape driver with OnStream support version 0.99.4
[   54.546542] osst :I: $Id: osst.c,v 1.73 2005/01/01 21:13:34 wriede Exp $
[   54.560110] SCSI Media Changer driver v0.25
[   54.564798] PNP: PS/2 Controller [PNP0303:PS2K,PNP0f03:PS2M] at 0x60,0x64 irq
1,12
[   54.572940] serio: i8042 KBD port at 0x60,0x64 irq 1
[   54.578075] serio: i8042 AUX port at 0x60,0x64 irq 12
[   54.583665] mice: PS/2 mouse device common for all mice
[   54.589185] md: linear personality registered for level -1
[   54.594682] md: raid0 personality registered for level 0
[   54.600011] md: raid1 personality registered for level 1
[   54.672826] raid6: int64x1   1865 MB/s
[   54.740694] raid6: int64x2   2508 MB/s
[   54.790668] (scsi0:A:0:0): Saw Selection Timeout for SCB 0x3
[   54.808931] raid6: int64x4   2190 MB/s
[   54.880403] raid6: int64x8   1641 MB/s
[   54.948293] raid6: sse2x12445 MB/s
[   55.016149] raid6: sse2x23332 MB/s
[   55.084035] raid6: sse2x43666 MB/s
[   55.087774] raid6: using algorithm sse2x4 (3666 MB/s)
[   55.092819] md: raid6 personality registered for level 6
[   55.098120] md: raid5 personality registered for level 5
[   55.103423] md: raid4 personality registered for level 4
[   55.108724] raid5: automatically using best checksumming function: generic_ss
e
[   55.131933]generic_sse:  6247.000 MB/sec
[   55.136192] raid5: using function: generic_sse (6247.000 MB/sec)
[   55.142185] md: multipath personality registered for level -4
[   55.148202] input: AT Translated Set 2 keyboard as