Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!
thomas schorpp wrote: thomas schorpp wrote: thomas schorpp wrote: James Bottomley wrote: On Sat, 2007-03-24 at 01:51 +0100, thomas schorpp wrote: no. so the pci layer reports wrong start: nonsense. it succeeds, confused function return with the error flag: // u_long start; // u_long start = 0xFFEFF000; u_long start = 0x3000; int error; struct resource* ret1; error = 0; // start = pci_resource_start(ahc-dev_softc, 1); if (start != 0) { *bus_addr = start; if ((ret1 = request_mem_region(start, 0x1000, aic7xxx)) == 0) You can't do this. The pci_resource_start is getting the address of something called a Bus Address Register (BAR) it says in physical address space where the card is responding ... you can't simply set that to a random value. The problem you seem to have is that your system is reporting a BAR beyond 32 bits (4GB) which the card physically can't use. This could be because of a BIOS misconfiguration or because there's a bug in the PCI subsystem somewhere. James understood. waiting for LKML answers... meanwhile i found harder reason for a possible bounds problem with the driver code on x86_64: if i do: static int ahc_linux_pci_reserve_mem_region(struct ahc_softc *ahc, u_long *bus_addr, uint8_t __iomem **maddr) { // u_long start; uint32_t start; i get no free warning of *nonexistant* resource (it cant be nonexistant, cause it was definitely something mapped): tom1:/usr/src/linux# dmesg |grep -i free Freeing unused kernel memory: 208k freed with u_long type start i get it: Mar 24 03:41:47 localhost kernel: Trying to free nonexistent resource f000- investigating further... - hmm well i dont get the free warning cause release_mem_region(ahc-platform_data-mem_busaddr, 0x1000); isnt called, the hack fails error = ahc_linux_pci_reserve_mem_region(ahc, base, maddr); if (error == 0) { ok, so no bounds issue in the driver. LKML people are ignoring my report, i take this as agreement to a mb bios issue. will test the card with a latest debian kernel x86_64 netinstall cd on some other amd64 machine, but i need to find some in my reach here. i need more confirmation before working in the linux pci hal. no other amd64 machines in reach. here's my fix. seems to be a h/w bug of the adaptec 19160 hba card, it is just faking 64bit BAR from the register read, doesn't care on i386 arch due to incomplete error handling ;) , but on x86_64 arch. since here and on LKML is no public interest in a real fix, I do no further investigation. Users, *DON'T try this at home, it may break real 64bit BAR cards* (if there're any for PCI32)! drivers/pci/probe.c static void pci_read_bases(struct pci_dev *dev, unsigned int howmany, int rom) { [...] if ((l (PCI_BASE_ADDRESS_SPACE | PCI_BASE_ADDRESS_MEM_TYPE_MASK)) == (PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64)) { u32 szhi, lhi; pci_read_config_dword(dev, reg+4, lhi); lhi = 0; //schorpp pci_write_config_dword(dev, reg+4, ~0); pci_read_config_dword(dev, reg+4, szhi); pci_write_config_dword(dev, reg+4, lhi); //kill the wrong read 0x0F szhi = pci_size(lhi, szhi, 0x); next++; printk(KERN_ERR PCI: 64-bit check REG for device %s l %lx%lx sz %lx%lx start %llx end %llx flags $ pci_name(dev), lhi, l, szhi, sz, res-start, res-end, res-flags); #if BITS_PER_LONG == 64 //the cause, more checks for buggy h/w needed or platform dep. bug somewhere deeper res-start |= ((unsigned long) lhi) 32; res-end = res-start + sz; printk(KERN_ERR PCI: 64-bit BAR check 1 for device %s l %lx%lx sz %lx%lx start %llx end %llx flag$ pci_name(dev), lhi, l, szhi, sz, res-start, res-end, res-flags); [...] hba fine again: tom1:/usr/src/linux# lspci -vvv -s 00:06.0 00:06.0 SCSI storage controller: Adaptec AIC-7892B U160/m (rev 02) Subsystem: Adaptec 19160 Ultra160 SCSI Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- Latency: 32 (1ns min, 6250ns max), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 17 BIST result: 00 Region 0: I/O ports at d800 [disabled] [size=256] Region 1: Memory at 3000 (64-bit, non-prefetchable) [size=4K] Expansion ROM at fbee [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk
Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!
James Bottomley wrote: On Fri, 2007-03-23 at 02:26 +0100, thomas schorpp wrote: ok, overriding the first while(ahc_is_paused) that blocked before (i see no sense for doing this in a pci mmap test function, cause proper resource setup is required *before* using such I/O functions, otherwise the adapter had entered SEQ paused status) i got the kernel to boot at least at pio mode. this is surely not the correct resource and looks like a datatype boundary overflow, the upper 0x0f is missing: [ 49.278810] Trying to free nonexistent resource f000-fff f That's because ahc-platform_data-mem_busaddr is u32 [ 54.513224] scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0 [ 54.513226] Adaptec 19160B Ultra160 SCSI adapter [ 54.513227] aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs The driver code suggests that the 7892 can't do the AHC_LARGE_SCBS features ... which means the card itself cannot address more than 32 bits of memory, so it would be unable to decode a BAR that's beyond the 32 bit range. So this looks like some type of error in the PCI config system (or possibly in the BIOS). I think this card needs its BARs to be in the lower 32 bits to function. James i agree for this to be a 32bit dma busmaster chip, since pci_resource_flags and lspci say 64bit mem resource type aic7xxx: pci_resource_start f000 *maddr 2 mem64 4 we've a bug in the x86_64 linux pci config, BIOS is ok, the hardware worked fine in a winxp_x64 test setup a few months ago. will ask LKML. y tom - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!
thomas schorpp wrote: James Bottomley wrote: On Fri, 2007-03-23 at 02:26 +0100, thomas schorpp wrote: ok, overriding the first while(ahc_is_paused) that blocked before (i see no sense for doing this in a pci mmap test function, cause proper resource setup is required *before* using such I/O functions, otherwise the adapter had entered SEQ paused status) i got the kernel to boot at least at pio mode. this is surely not the correct resource and looks like a datatype boundary overflow, the upper 0x0f is missing: [ 49.278810] Trying to free nonexistent resource f000-fff f That's because ahc-platform_data-mem_busaddr is u32 [ 54.513224] scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0 [ 54.513226] Adaptec 19160B Ultra160 SCSI adapter [ 54.513227] aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs The driver code suggests that the 7892 can't do the AHC_LARGE_SCBS features ... which means the card itself cannot address more than 32 bits of memory, so it would be unable to decode a BAR that's beyond the 32 bit range. So this looks like some type of error in the PCI config system (or possibly in the BIOS). I think this card needs its BARs to be in the lower 32 bits to function. James i agree for this to be a 32bit dma busmaster chip, since pci_resource_flags and lspci say 64bit mem resource type aic7xxx: pci_resource_start f000 *maddr 2 mem64 4 we've a bug in the x86_64 linux pci config, BIOS is ok, the hardware worked fine in a winxp_x64 test setup a few months ago. will ask LKML. y tom sorry, wrong according to http://download.adaptec.com/pdfs/aic7892.pdf. 66 MHz, 64-bit, PCI interface that supports zero wait-state memory; also operates on 33 MHz, 32-bit PCI busses this chip is capable of 64bit addressing, as pci_resource_ (checking this) on x86_64 platform and lspci on x86_64 *and* AMDK7 configured kernels reports, even on PCI/32, right? or is it impossible to do multiplexed 64bit mem addressing on PCI/32? Why are the driver structure address members 32bits wide types if therere PCI/64 card models with this chips as listet in aic7xxx.txt kernel doc and stated in aic7892.pdf? I'll adapt the respective driver structures and function args now to 64bit and see what happens... can adaptec.inc pls comment? since the aha19160 card is still in production state, i assume they want to have a linux x86_64 dma capable driver. so far it is not, or can other users having this card pls confirm my pci system broken? y tom - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][DOC]aic7xxx: Correct wrong Kernel Documentation for Adaptec AHA19160(B) HBA
--- Documentation/scsi/aic7xxx.txt 2007-03-23 16:44:05.0 +0100 +++ Documentation/scsi/aic7xxx.txt 2007-03-23 17:01:19.0 +0100 @@ -28,7 +28,7 @@ aic7880 10PCI/3220MHz16Bit 16 aic7890 20PCI/3240MHz16Bit 16 3 4 5 6 7 8 aic7891 20PCI/6440MHz16Bit 16 3 4 5 6 7 8 - aic7892 20 PCI/64-66 80MHz16Bit 16 3 4 5 6 7 8 + aic7892 20 PCI/32/64-66 80MHz16Bit 16 3 4 5 6 7 8 aic7895 15PCI/3220MHz16Bit 162 3 4 5 aic7895C 15PCI/3220MHz16Bit 162 3 4 5 8 aic7896 20PCI/3240MHz16Bit 162 3 4 5 6 7 8 @@ -114,7 +114,7 @@ AHA-29160N aic7892 PCI/32 LVD-HD68F SE-HD50F SE-50M AHA-29160LPaic7892 PCI/64-66 - AHA-19160 aic7892 PCI/64-66 + AHA-19160 aic7892 PCI/32 AHA-29150LPaic7892 PCI/64-66 AHA-29130LPaic7892 PCI/64-66 AHA-3960D aic7899 PCI/64-66 2 X LVD-HD68F 2 X LVD-VHD68F Correct documentation according to the aic7892 chip, adaptec aha19160 hba specs. Signed-off-by tom schorpp [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!
James Bottomley wrote: On Fri, 2007-03-23 at 17:28 +0100, thomas schorpp wrote: i agree for this to be a 32bit dma busmaster chip, since pci_resource_flags and lspci say 64bit mem resource type aic7xxx: pci_resource_start f000 *maddr 2 mem64 4 static int ahc_linux_pci_reserve_mem_region(struct ahc_softc *ahc, u_long *bus_addr, uint8_t __iomem **maddr) { // u_long start; u_long len; int error; uint64_t start; ... printk(KERN_WARNING aic7xxx: pci_resource_start 0x%llx mem64 0x%lx\n, start, pci_resource_flags(ahc-dev_softc, 1) PCI_BASE_ADDRESS_MEM_TYPE_64 ); //schorpp return (error); aic7xxx: pci_resource_start 0xff000 mem64 0x4 ---^ just to doublecheck the situation, posted lspci already. will check next, if len = pci_resource_len(ahc-dev_softc, 1); if (start != 0) { *bus_addr = start; // if (request_mem_region(start, 0x1000, aic7xxx) == 0) if (request_mem_region(start, len, aic7xxx) == 0) succeeds. we've a bug in the x86_64 linux pci config, BIOS is ok, the hardware worked fine in a winxp_x64 test setup a few months ago. will ask LKML. y tom sorry, wrong according to http://download.adaptec.com/pdfs/aic7892.pdf. 66 MHz, 64-bit, PCI interface that supports zero wait-state memory; also operates on 33 MHz, 32-bit PCI busses this chip is capable of 64bit addressing, as pci_resource_ (checking this) on x86_64 platform and lspci on x86_64 *and* AMDK7 configured kernels reports, even on PCI/32, right? or is it impossible to do multiplexed 64bit mem addressing on PCI/32? It can only do 37 bit addressing ... only the aic79xx can do the full 64 bits, so I suspect it should never get a 64 bit BAR, since it wouldn't be able to decode the full 32 bits. I can fix the mmio check not to hang, but the card won't actually work mmio until whatever's assigning the BAR above 32 bits is fixed (that could either be a kernel PCI bug or a BIOS bug). ok, i trust in that. adaptor bios and mainboard bios *are* out, winxp_x64 driver handled all. so agree on kernel pci hal issue. but what for const uint64_t mask_39bit = 0x7FULL; then? can adaptec.inc pls comment? since the aha19160 card is still in production state, i assume they want to have a linux x86_64 dma capable driver. so far it is not, or can other users having this card pls confirm my pci system broken? James y tom - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!
thomas schorpp wrote: thomas schorpp wrote: James Bottomley wrote: On Fri, 2007-03-23 at 17:28 +0100, thomas schorpp wrote: i agree for this to be a 32bit dma busmaster chip, since pci_resource_flags and lspci say 64bit mem resource type aic7xxx: pci_resource_start f000 *maddr 2 mem64 4 static int ahc_linux_pci_reserve_mem_region(struct ahc_softc *ahc, u_long *bus_addr, uint8_t __iomem **maddr) { // u_long start; u_long len; int error; uint64_t start; ... printk(KERN_WARNING aic7xxx: pci_resource_start 0x%llx mem64 0x%lx\n, start, pci_resource_flags(ahc-dev_softc, 1) PCI_BASE_ADDRESS_MEM_TYPE_64 ); //schorpp return (error); aic7xxx: pci_resource_start 0xff000 mem64 0x4 ---^ just to doublecheck the situation, posted lspci already. will check next, if len = pci_resource_len(ahc-dev_softc, 1); if (start != 0) { *bus_addr = start; // if (request_mem_region(start, 0x1000, aic7xxx) == 0) if (request_mem_region(start, len, aic7xxx) == 0) succeeds. no. so the pci layer reports wrong start: nonsense. it succeeds, confused function return with the error flag: // u_long start; // u_long start = 0xFFEFF000; u_long start = 0x3000; int error; struct resource* ret1; error = 0; // start = pci_resource_start(ahc-dev_softc, 1); if (start != 0) { *bus_addr = start; if ((ret1 = request_mem_region(start, 0x1000, aic7xxx)) == 0) error = ENOMEM; printk(KERN_WARNING aic7xxx: req_mem_region start 0x%lx\n, \ ret1-start); //schorpp if (error == 0) { *maddr = ioremap_nocache(start, 256); if (*maddr == NULL) { error = ENOMEM; release_mem_region(start, 0x1000); } } } else error = ENOMEM; tom1:~# dmesg |grep aic aic7xxx: DMA_32BIT_MASK aic7xxx: req_mem_region start 0x3000 aic7xxx: pci_resource_start 0x3000 **maddr 0xff mem64 0x4 aic7xxx: PCI Device 0:6:0 failed memory mapped test. Using PIO. aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs tried the mem start value from lspci on a running knoppix and winxp, but the if (ahc_pci_test_register_access(ahc) != 0) { does not go. thought the pci resources were constant and i could hardcode for my system ;) remarkable is, with this mem start setting the kernel autofree(?) does not take action. we've a bug in the x86_64 linux pci config, BIOS is ok, the hardware worked fine in a winxp_x64 test setup a few months ago. will ask LKML. y tom sorry, wrong according to http://download.adaptec.com/pdfs/aic7892.pdf. 66 MHz, 64-bit, PCI interface that supports zero wait-state memory; also operates on 33 MHz, 32-bit PCI busses this chip is capable of 64bit addressing, as pci_resource_ (checking this) on x86_64 platform and lspci on x86_64 *and* AMDK7 configured kernels reports, even on PCI/32, right? or is it impossible to do multiplexed 64bit mem addressing on PCI/32? It can only do 37 bit addressing ... only the aic79xx can do the full 64 bits, so I suspect it should never get a 64 bit BAR, since it wouldn't be able to decode the full 32 bits. I can fix the mmio check not to hang, but the card won't actually work mmio until whatever's assigning the BAR above 32 bits is fixed (that could either be a kernel PCI bug or a BIOS bug). ok, i trust in that. adaptor bios and mainboard bios *are* out, winxp_x64 driver handled all. so agree on kernel pci hal issue. but what for const uint64_t mask_39bit = 0x7FULL; then? can adaptec.inc pls comment? since the aha19160 card is still in production state, i assume they want to have a linux x86_64 dma capable driver. so far it is not, or can other users having this card pls confirm my pci system broken? James y tom - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!
James Bottomley wrote: On Sat, 2007-03-24 at 01:51 +0100, thomas schorpp wrote: no. so the pci layer reports wrong start: nonsense. it succeeds, confused function return with the error flag: // u_long start; // u_long start = 0xFFEFF000; u_long start = 0x3000; int error; struct resource* ret1; error = 0; // start = pci_resource_start(ahc-dev_softc, 1); if (start != 0) { *bus_addr = start; if ((ret1 = request_mem_region(start, 0x1000, aic7xxx)) == 0) You can't do this. The pci_resource_start is getting the address of something called a Bus Address Register (BAR) it says in physical address space where the card is responding ... you can't simply set that to a random value. The problem you seem to have is that your system is reporting a BAR beyond 32 bits (4GB) which the card physically can't use. This could be because of a BIOS misconfiguration or because there's a bug in the PCI subsystem somewhere. James understood. waiting for LKML answers... meanwhile i found harder reason for a possible bounds problem with the driver code on x86_64: if i do: static int ahc_linux_pci_reserve_mem_region(struct ahc_softc *ahc, u_long *bus_addr, uint8_t __iomem **maddr) { // u_long start; uint32_t start; i get no free warning of *nonexistant* resource (it cant be nonexistant, cause it was definitely something mapped): tom1:/usr/src/linux# dmesg |grep -i free Freeing unused kernel memory: 208k freed with u_long type start i get it: Mar 24 03:41:47 localhost kernel: Trying to free nonexistent resource f000- investigating further... - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!
thomas schorpp wrote: James Bottomley wrote: On Sat, 2007-03-24 at 01:51 +0100, thomas schorpp wrote: no. so the pci layer reports wrong start: nonsense. it succeeds, confused function return with the error flag: // u_long start; // u_long start = 0xFFEFF000; u_long start = 0x3000; int error; struct resource* ret1; error = 0; // start = pci_resource_start(ahc-dev_softc, 1); if (start != 0) { *bus_addr = start; if ((ret1 = request_mem_region(start, 0x1000, aic7xxx)) == 0) You can't do this. The pci_resource_start is getting the address of something called a Bus Address Register (BAR) it says in physical address space where the card is responding ... you can't simply set that to a random value. The problem you seem to have is that your system is reporting a BAR beyond 32 bits (4GB) which the card physically can't use. This could be because of a BIOS misconfiguration or because there's a bug in the PCI subsystem somewhere. James understood. waiting for LKML answers... meanwhile i found harder reason for a possible bounds problem with the driver code on x86_64: if i do: static int ahc_linux_pci_reserve_mem_region(struct ahc_softc *ahc, u_long *bus_addr, uint8_t __iomem **maddr) { // u_long start; uint32_t start; i get no free warning of *nonexistant* resource (it cant be nonexistant, cause it was definitely something mapped): tom1:/usr/src/linux# dmesg |grep -i free Freeing unused kernel memory: 208k freed with u_long type start i get it: Mar 24 03:41:47 localhost kernel: Trying to free nonexistent resource f000- investigating further... - hmm well i dont get the free warning cause release_mem_region(ahc-platform_data-mem_busaddr, 0x1000); isnt called, the hack fails error = ahc_linux_pci_reserve_mem_region(ahc, base, maddr); if (error == 0) { ok, so no bounds issue in the driver. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!
lo, well, ive several live cd systems 2.6.19.5i386 that oops and hang boot in aic7xxx init, only one booting here is knoppix 5.2, the latest unofficial debian stable 2.6.8-12-amd64-generic, which says ACPI: PCI interrupt :00:06.0[A] - GSI 17 (level, low) - IRQ 17 aic7xxx: PCI0:6:0 MEM region 0x0 unavailable. Cannot memory map device. but works ok, a debian etch 2.6.18-4-amd64 which says: SCSI subsystem initialized GSI 16 sharing vector 0xA9 and IRQ 16 ACPI: PCI Interrupt :00:06.0[A] - GSI 17 (level, low) - IRQ 169 BUG: soft lockup detected on CPU#0! Call Trace: IRQ [802a3fec] softlockup_tick+0xdb/0xed [802881df] update_process_times+0x42/0x68 [8026cbd8] smp_local_timer_interrupt+0x23/0x47 [8026d2cc] smp_apic_timer_interrupt+0x41/0x47 [8025904a] apic_timer_interrupt+0x66/0x6c EOI [8038a412] pci_conf1_write+0x0/0xc9 [88053718] :aic7xxx:ahc_pci_test_register_access+0xc2/0x391 [880536a5] :aic7xxx:ahc_pci_test_register_access+0x4f/0x391 [88059416] :aic7xxx:ahc_pci_map_registers+0x1bb/0x239 [880523d2] :aic7xxx:ahc_pci_config+0x4c/0x12d0 [80389fb7] pcibios_set_master+0x1e/0x84 [88059186] :aic7xxx:ahc_linux_pci_dev_probe+0x13e/0x213 [80317eea] pci_device_probe+0xdf/0x147 [8036b9db] driver_probe_device+0x52/0xa8 [8036ba96] __driver_attach+0x0/0x9a [8036bae6] __driver_attach+0x50/0x9a [8036ba96] __driver_attach+0x0/0x9a [8036b458] bus_for_each_dev+0x43/0x6e [8036b09a] bus_add_driver+0x7e/0x130 [803180c4] __pci_register_driver+0x57/0x7d [8805903e] :aic7xxx:ahc_linux_pci_init+0x17/0x21 [8806e325] :aic7xxx:ahc_linux_init+0x325/0x336 [8027d27d] default_wake_function+0x0/0xe [8025e2e5] __down_read+0x12/0x9a [80294fa1] __link_module+0x0/0x25 [802200e5] __up_read+0x13/0x8a [80297695] sys_init_module+0x16cc/0x1882 [802584d6] system_call+0x7e/0x83 BUG: soft lockup detected on CPU#0! a kernel.org 2.6.20 with K8 config set but built in a 32Bit debian sid environment, but works ok, and finally the latest kernel.org 2.6.20.3 AMD K8 built on debian amd64 etch userland that hangs boot on aic7xxx init without magic sysreq keys functionality: Loading iSCSI transport class v2.0-724. ACPI: PCI Interrupt :00:06.0[A] - GSI 17 (level, low) - IRQ 17 ... Kernel alive - Kernel direct mapping tables up to 1 @ 8000-d000 now trying latest scsi git and be on ##kernel at freenode if Q. y tom SysRq : Resetting Linux version 2.6.20.3amd64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease7 Command line: root=/dev/sda1 ro single console=ttyS0,115200n8 aic7xxx=debug=255 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e4000 - 0010 (reserved) BIOS-e820: 0010 - 1ffd (usable) BIOS-e820: 1ffd - 1ffde000 (ACPI data) BIOS-e820: 1ffde000 - 2000 (ACPI NVS) BIOS-e820: fec0 - fec01000 (reserved) BIOS-e820: ff78 - 0001 (reserved) end_pfn_map = 1048576 DMI 2.3 present. Zone PFN ranges: DMA 0 - 4096 DMA324096 - 1048576 Normal1048576 - 1048576 early_node_map[2] active PFN ranges 0:0 - 159 0: 256 - 131024 ACPI: PM-Timer IO Port: 0x808 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x81] disabled) ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 1, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information Nosave address range: 0009f000 - 000a Nosave address range: 000a - 000e4000 Nosave address range: 000e4000 - 0010 Allocating PCI resources starting at 3000 (gap: 2000:dec0) Built 1 zonelists. Total pages: 127672 Kernel command line: root=/dev/sda1 ro single console=ttyS0,115200n8 aic7xxx=de5 Initializing CPU#0 PID hash table entries: 2048 (order: 11, 16384 bytes) time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer. time.c: Detected 2000.164 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 65536 (order: 7, 524288 bytes) Inode-cache hash table entries: 32768 (order: 6, 262144 bytes) Checking aperture... CPU 0: aperture @ d000 size 128 MB Memory: 509592k/524096k available (3711k kernel code, 13908k reserved, 1316k da) Calibrating delay using timer specific routine.. 4005.05 BogoMIPS (lpj=8010104) Security Framework v1.0.0 initialized Mount-cache hash table entries: 256 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2
Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!
no fix in scsi rc fixes git, now examining code from the softlockup trace before... [0.00] Linux version 2.6.21-rc3amd64-gbb9ba31c ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 PREEMPT Thu Mar 22 17:39:17 CE T 2007 [0.00] Command line: root=/dev/sda1 ro single console=ttyS0,115200n8 [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 0009fc00 (usable) [0.00] BIOS-e820: 0009fc00 - 000a (reserved) [0.00] BIOS-e820: 000e4000 - 0010 (reserved) [0.00] BIOS-e820: 0010 - 1ffd (usable) [0.00] BIOS-e820: 1ffd - 1ffde000 (ACPI data) [0.00] BIOS-e820: 1ffde000 - 2000 (ACPI NVS) [0.00] BIOS-e820: fec0 - fec01000 (reserved) [0.00] BIOS-e820: ff78 - 0001 (reserved) [0.00] end_pfn_map = 1048576 [0.00] DMI 2.3 present. [0.00] ACPI: RSDP 000F92B0, 0014 (r0 ACPIAM) [0.00] ACPI: RSDT 1FFD, 0030 (r1 A M I OEMRSDT 1612 MSFT 97) [0.00] ACPI: FACP 1FFD0200, 0084 (r2 A M I OEMFACP 1612 MSFT 97) [0.00] ACPI: DSDT 1FFD03F0, 3D20 (r1 1 10055 INTL 2002 026) [0.00] ACPI: FACS 1FFDE000, 0040 [0.00] ACPI: APIC 1FFD0390, 005C (r1 A M I OEMAPIC 1612 MSFT 97) [0.00] ACPI: OEMB 1FFDE040, 0046 (r1 A M I AMI_OEM 1612 MSFT 97) [0.00] Zone PFN ranges: [0.00] DMA 0 - 4096 [0.00] DMA324096 - 1048576 [0.00] Normal1048576 - 1048576 [0.00] early_node_map[2] active PFN ranges [0.00] 0:0 - 159 [0.00] 0: 256 - 131024 [0.00] Looks like a VIA chipset. Disabling IOMMU. Override with iommu=al lowed [0.00] ACPI: PM-Timer IO Port: 0x808 [0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) [0.00] Processor #0 (Bootup-CPU) [0.00] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x81] disabled) [0.00] ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0]) [0.00] IOAPIC[0]: apic_id 1, address 0xfec0, GSI 0-23 [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [0.00] Setting APIC routing to flat [0.00] Using ACPI (MADT) for SMP configuration information [0.00] Nosave address range: 0009f000 - 000a [0.00] Nosave address range: 000a - 000e4000 [0.00] Nosave address range: 000e4000 - 0010 [0.00] Allocating PCI resources starting at 3000 (gap: 2000:dec0 ) [0.00] Built 1 zonelists. Total pages: 126532 [0.00] Kernel command line: root=/dev/sda1 ro single console=ttyS0,11520 0n8 [0.00] Initializing CPU#0 [0.00] PID hash table entries: 2048 (order: 11, 16384 bytes) [ 40.851937] time.c: Detected 2000.089 MHz processor. [ 40.853406] Console: colour VGA+ 80x25 [ 41.128559] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [ 41.136287] ... MAX_LOCKDEP_SUBCLASSES:8 [ 41.140560] ... MAX_LOCK_DEPTH: 30 [ 41.144747] ... MAX_LOCKDEP_KEYS:2048 [ 41.149105] ... CLASSHASH_SIZE: 1024 [ 41.153550] ... MAX_LOCKDEP_ENTRIES: 8192 [ 41.157901] ... MAX_LOCKDEP_CHAINS: 16384 [ 41.162346] ... CHAINHASH_SIZE: 8192 [ 41.166704] memory used by lock dependency info: 1648 kB [ 41.172093] per task-struct memory footprint: 1680 bytes [ 41.177480] [ 41.181041] | Locking API testsuite: [ 41.184615] - --- [ 41.192694] | spin |wlock |rlock |mutex | ws em | rsem | [ 41.200771] --- --- [ 41.208855] A-A deadlock: ok | ok | ok | ok | o k | ok | [ 41.217928] A-B-B-A deadlock: ok | ok | ok | ok | o k | ok | [ 41.226939] A-B-B-C-C-A deadlock: ok | ok | ok | ok | o k | ok | [ 41.236020] A-B-C-A-B-C deadlock: ok | ok | ok | ok | o k | ok | [ 41.245093] A-B-B-C-C-D-D-A deadlock: ok | ok | ok | ok | o k | ok | [ 41.254251] A-B-C-D-B-D-D-A deadlock: ok | ok | ok | ok | o k | ok | [ 41.263393] A-B-C-D-B-C-D-A deadlock: ok | ok | ok | ok | o k | ok | [ 41.272560] double unlock: ok | ok | ok | ok | o k | ok | [ 41.281529] initialize held: ok | ok | ok | ok | o k | ok | [ 41.290645] bad unlock order: ok |
Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!
[ 48.848796] Loading iSCSI transport class v2.0-724. [ 48.854066] iscsi: registered transport (tcp) [ 48.858479] ahc_linux_pci_init [ 48.861676] ahc_linux_pci_dev_probe [ 48.865208] ACPI: PCI Interrupt :00:06.0[A] - GSI 17 (level, low) - IRQ 17 [ 48.872628] ahc_pci_config [ 48.875335] set_power_state [ 48.878126] map_registers [ 48.880744] ahc_pci_map_registers enter [ 48.884571] .read_config [ 48.887106] .reserve_mem [ 48.889647] .write_config_iferr0 [ 48.892871] .test_registers_iferr0 [ 48.896265] ahc_pci_test_register_access enter [ 48.900699] .read_config [ 48.903235] .write_config_noserr [ 48.906462] .hcnctrl [ 48.908648] .hcntrl pause cmd [ 48.911616] .I will pause 4E4 if missing errh before :/ ok, as expected, the wait for pause ended loop, (someone with the specs pls say max HZ for a) wait_interruptible(_timeout)() here. yes, I know that case must not happen and it should bug cause the pci config is messed up already, but generally such loops are surely inacceptable in such a kernel thread. will dump the config data to here, maybe its readable to the aic devs. then disable test and will let it bug somewhere further where the cause can possibly be easier seen. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx: aic7892(B): BUG: soft lockup detected on CPU#0!
ok, overriding the first while(ahc_is_paused) that blocked before (i see no sense for doing this in a pci mmap test function, cause proper resource setup is required *before* using such I/O functions, otherwise the adapter had entered SEQ paused status) i got the kernel to boot at least at pio mode. this is surely not the correct resource and looks like a datatype boundary overflow, the upper 0x0f is missing: [ 49.278810] Trying to free nonexistent resource f000-fff f -f000 tom1:~# lspci -vvv -s 00:06.0 00:06.0 SCSI storage controller: Adaptec AIC-7892B U160/m (rev 02) Subsystem: Adaptec 19160 Ultra160 SCSI Controller Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Step ping- SERR+ FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort - MAbort- SERR- PERR- Latency: 32 (1ns min, 6250ns max), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 17 BIST result: 00 Region 0: I/O ports at d800 [size=256] Region 1: Memory at ff000 (64-bit, non-prefetchable) [disabled] [siz e=4K] Expansion ROM at fbee [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot -,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 000f--f000 but theres a platform issue somewhere in the code, affecting x86_64. maintainers, pls have a look, too, thx. [ 49.181771] Loading iSCSI transport class v2.0-724. [ 49.187129] iscsi: registered transport (tcp) [ 49.191491] ahc_linux_pci_init [ 49.194682] ahc_linux_pci_dev_probe [ 49.198221] ACPI: PCI Interrupt :00:06.0[A] - GSI 17 (level, low) - IRQ 17 [ 49.205636] ahc_pci_config [ 49.208337] set_power_state [ 49.211131] map_registers [ 49.213748] ahc_pci_map_registers enter [ 49.217574] .read_config [ 49.220110] .reserve_mem [ 49.222649] .write_config_iferr0 [ 49.225869] .test_registers_iferr0 [ 49.229267] ahc_pci_test_register_access enter [ 49.233704] .read_config 116 [ 49.236584] .write_config_noserr [ 49.239810] .hcnctrl [ 49.241998] .paused 0 [ 49.244362] .write_config [ 49.246982] .write_config [ 49.249614] .fail scb_base [ 49.252321] .ending fail err 5 [ 49.255368] .read_config [ 49.257901] .write_config [ 49.260523] .clrint [ 49.262622] .seqctl [ 49.264720] .write_config [ 49.267340] ahc_pci_test_register_access leave [ 49.271775] aic7xxx: PCI Device 0:6:0 failed memory mapped test. Using PIO. [ 49.278810] Trying to free nonexistent resource f000-fff f [ 49.286443] .reserve_io [ 49.288894] reserve_io_ok [ 49.291510] .write_config [ 49.294129] map_registers leave [ 49.297265] read_config [ 49.299710] write_config1 [ 49.302330] write_config2 [ 49.304951] softc_init [ 49.307329] ahc_reset [ 49.322446] ahc_init_core [ 49.325301] ahc_pci:0:6:0: hardware scb 64 bytes; kernel scb 104 bytes; ahc_d ma 8 bytes [ 49.519047] ENINT [ 54.513224] scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0 [ 54.513226] Adaptec 19160B Ultra160 SCSI adapter [ 54.513227] aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs [ 54.513228] [ 54.534992] Adaptec aacraid driver (1.1-5[2423]-mh3) [ 54.540128] st: Version 20070203, fixed bufsize 32768, s/g segs 256 [ 54.546541] osst :I: Tape driver with OnStream support version 0.99.4 [ 54.546542] osst :I: $Id: osst.c,v 1.73 2005/01/01 21:13:34 wriede Exp $ [ 54.560110] SCSI Media Changer driver v0.25 [ 54.564798] PNP: PS/2 Controller [PNP0303:PS2K,PNP0f03:PS2M] at 0x60,0x64 irq 1,12 [ 54.572940] serio: i8042 KBD port at 0x60,0x64 irq 1 [ 54.578075] serio: i8042 AUX port at 0x60,0x64 irq 12 [ 54.583665] mice: PS/2 mouse device common for all mice [ 54.589185] md: linear personality registered for level -1 [ 54.594682] md: raid0 personality registered for level 0 [ 54.600011] md: raid1 personality registered for level 1 [ 54.672826] raid6: int64x1 1865 MB/s [ 54.740694] raid6: int64x2 2508 MB/s [ 54.790668] (scsi0:A:0:0): Saw Selection Timeout for SCB 0x3 [ 54.808931] raid6: int64x4 2190 MB/s [ 54.880403] raid6: int64x8 1641 MB/s [ 54.948293] raid6: sse2x12445 MB/s [ 55.016149] raid6: sse2x23332 MB/s [ 55.084035] raid6: sse2x43666 MB/s [ 55.087774] raid6: using algorithm sse2x4 (3666 MB/s) [ 55.092819] md: raid6 personality registered for level 6 [ 55.098120] md: raid5 personality registered for level 5 [ 55.103423] md: raid4 personality registered for level 4 [ 55.108724] raid5: automatically using best checksumming function: generic_ss e [ 55.131933]generic_sse: 6247.000 MB/sec [ 55.136192] raid5: using function: generic_sse (6247.000 MB/sec) [ 55.142185] md: multipath personality registered for level -4 [ 55.148202] input: AT Translated Set 2 keyboard as