Re: [AIC7xxx] tree things to report
Please try the attached patch and see if it helps. James, I know that the aic7xxx has some 'next_queued_hscb' pointer which might be utilized for this sort of thing. But I didn't really figure out how this thing is supposed to work nor how we could utilize it. So I figured that the added complexity is not really worth it. Did you have time to look at the new logs this the patch applied ? Do you need something else ? Cheers, Emmanuel. --- Créez votre adresse électronique [EMAIL PROTECTED] 1 Go d'espace de stockage, anti-spam et anti-virus intégrés. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AIC7xxx] tree things to report
Emmanuel Fusté wrote: Hello, What you should do here is: - hook up a serial cable and re-route console messages to that - Switch off syslog (as this might block if the SCSI bus frozen) - Enable scsi debugging (Error, Timeout, Scan, and Midlayer is sufficient) and start cdrwtools. - Send me the log from the serial console. Ok, I've got logs with netconsole after swapping my Ethernet card with another one. Grand. Well done, son. The logs have been very instructive. Again we're hitting this 'two commands per lun' problem. For historic reasons the aic7xxx and aic79xx driver accepted two commands per luns, as they implemented their internal queueing and could hold the second command on the queue. With later versions I've removed this internal queueing and relied on the block-layer for this. But this also means we can only accept one command per lun. Please try the attached patch and see if it helps. James, I know that the aic7xxx has some 'next_queued_hscb' pointer which might be utilized for this sort of thing. But I didn't really figure out how this thing is supposed to work nor how we could utilize it. So I figured that the added complexity is not really worth it. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage [EMAIL PROTECTED] +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) Allow only a single command per lun for aic7xxx With the conversion to use transport classes we also removed the internal queueing from the driver. Hence the existing hack of accepting two commands per lun and just holding this other one internally is no longer valid. Signed-off-by: Hannes Reinecke [EMAIL PROTECTED] diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c b/drivers/scsi/aic7xxx/aic79xx_osm.c index 6054881..df8a3b2 100644 --- a/drivers/scsi/aic7xxx/aic79xx_osm.c +++ b/drivers/scsi/aic7xxx/aic79xx_osm.c @@ -775,7 +775,7 @@ struct scsi_host_template aic79xx_driver .can_queue = AHD_MAX_QUEUE, .this_id= -1, .max_sectors= 8192, - .cmd_per_lun= 2, + .cmd_per_lun= 1, .use_clustering = ENABLE_CLUSTERING, .slave_alloc= ahd_linux_slave_alloc, .slave_configure= ahd_linux_slave_configure, diff --git a/drivers/scsi/aic7xxx/aic7xxx_osm.c b/drivers/scsi/aic7xxx/aic7xxx_osm.c index 660f26e..e6b87b9 100644 --- a/drivers/scsi/aic7xxx/aic7xxx_osm.c +++ b/drivers/scsi/aic7xxx/aic7xxx_osm.c @@ -755,7 +755,7 @@ struct scsi_host_template aic7xxx_driver .can_queue = AHC_MAX_QUEUE, .this_id= -1, .max_sectors= 8192, - .cmd_per_lun= 2, + .cmd_per_lun= 1, .use_clustering = ENABLE_CLUSTERING, .slave_alloc= ahc_linux_slave_alloc, .slave_configure= ahc_linux_slave_configure,
Re: [AIC7xxx] tree things to report
Grand. Well done, son. The logs have been very instructive. Again we're hitting this 'two commands per lun' problem. For historic reasons the aic7xxx and aic79xx driver accepted two commands per luns, as they implemented their internal queueing and could hold the second command on the queue. With later versions I've removed this internal queueing and relied on the block-layer for this. But this also means we can only accept one command per lun. Please try the attached patch and see if it helps. James, I know that the aic7xxx has some 'next_queued_hscb' pointer which might be utilized for this sort of thing. But I didn't really figure out how this thing is supposed to work nor how we could utilize it. So I figured that the added complexity is not really worth it. Hi, Some news: I tried the patch and I still get this sort of instant bus freeze with difficult recovery. But there is some interesting new things too: First log: standard boot, netconsole start, echo 32767 /proc/sys/dev/scsi/scsi ; cdrwtool -t4 -d/dev/sr0 -q === scsi bus crash - lot of log - Kernel Bug. Second log: standard boot , init 1 to go into single user mode, echo 32767 /proc/sys/dev/scsi/scsi ; cdrwtool -t4 -d/dev/sr0 -q Bus crash, recovery, cdrwtool command crash, get the shell back. Remount root-fs read-only to suppress completely sd 0:0:0:0 activity. cdrwtool -t4 -d/dev/sr0 -q Lots of recovery logs and blablabla your cd will be completely erased blablablapress y to continue !!! Y enter - writer led start to blink, formating is running But ~30s later, driver recovery or scsi timeout or midlayer timeout (I don't know) is kicking, device is reseted, stopping the disc formating. sniff. cdrwtool report udf filesystem structure initialization but all is discarded by the driver or the midlayer. cdrwtool exit. last log: without reboot: cdrwtool -t4 -d/dev/sr0 -q bus freeze, lots of log, cdrwtool command crash ... All that I could say with my limited understanding of the big picture and what I previously saw: - The aic7xxx recovery path is still very fragile and unable to recover from problems under scsi bus activity. Perhaps the port of your previous work on this path from aic79xx could help. - It seems that the commands send by cdrwtool which confuse the driver are commands to sense the properties of the inserted media and not the formating command itself. - The first part: commands which crash the scsi bus before the begin of the media format was not happening before the big driver change (before 2.6.14). - The second part: formating interrupted because driver recovery kicked in was already happening with 2.6.13 and the recovery already fail and never recover - Perhaps to worsen all of this, cdrwtool is very crude with my old yamaha cdwriter and help to trigger a chain of worst case events which expose lots of bugs/unhanded cases. (cdrwtool bugs/writer firmware bugs/aic7xxx bugs .). Hope this could help. The positive thing is that now with the help of Francois Romieu I could use my old pcnet32 card to get the logs ;-) Best regards, Emmanuel. --- Créez votre adresse électronique [EMAIL PROTECTED] 1 Go d'espace de stockage, anti-spam et anti-virus intégrés. log.rafale-new-1.gz Description: application/gzip log.rafale-new-2.gz Description: application/gzip log.rafale-new-3.gz Description: application/gzip
Re: [AIC7xxx] tree things to report
Hello, While trying to obtain scsi log to debug a driver problem, I tried to use netconsole on an old smp system with a 10Mbits pcnet32 ethernet device. But few seconds after enabling netconsole, before launching my scsi test, but after a few console activity the computer freeze hard. Is it a know or expected problem ? (2.6.21 kernel, pcnet32: PCnet/PCI 79C970) Have you some solutions or patch to try ? Will get back my soldering iron to do a serial cable for now. Thanks, Emmanuel. Hi Emmanuel, Emmanuel Fusté wrote: Hello, After one year of rest, I resurrect my old computer, install a 2.6.21 kernel and updated my Debian distro. Tree things to repport: First, a cosmetic thing: I have two scsi sync devices and two async devices. For the first ones, domain validation return the negociated speed and mode. For the second ones, domain validation return nothing. I expect it is just a 'missing feature' but that all went ok. I am right ? scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0 Adaptec 2940 Ultra2 SCSI adapter aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs scsi 0:0:0:0: Direct-Access IBM DMVS18V 02B0 PQ: 0 ANSI: 3 scsi0:A:0:0: Tagged Queuing enabled. Depth 8 target0:0:0: Beginning Domain Validation target0:0:0: wide asynchronous target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25ns, offset 31) target0:0:0: Domain Validation skipping write tests target0:0:0: Ending Domain Validation scsi 0:0:3:0: CD-ROMYAMAHA CRW6416S 1.0d PQ: 0 ANSI: 2 target0:0:3: Beginning Domain Validation target0:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns,offset 15) target0:0:3: Domain Validation skipping write tests target0:0:3: Ending Domain Validation scsi 0:0:4:0: CD-ROMTOSHIBA CD-ROM XM-3501TA 1875 PQ: 0 ANSI: 2 target0:0:4: Beginning Domain Validation target0:0:4: Domain Validation skipping write tests target0:0:4: Ending Domain Validation scsi 0:0:6:0: Sequential-Access WANGTEK 5525ES SCSI REV7 0W PQ: 0 ANSI: 1 target0:0:6: Beginning Domain Validation target0:0:6: Ending Domain Validation Hmm. Have to have a look at it. It should at least report the result ... Secondly, It seems that something is doing weird things with my old CD-ROM reader (XM-3501TA). At some point in time (not really regular), I get this in my logs: May 23 00:45:44 rafale kernel: (scsi0:A:4:0): No or incomplete CDB sent to device. May 23 00:45:44 rafale kernel: (scsi0:A:4:0): Protocol violation in Message-in phase. Attempting to abort. May 23 00:45:44 rafale kernel: (scsi0:A:4:0): Abort Message Sent May 23 00:45:44 rafale kernel: (scsi0:A:4:0): SCB 11 - Abort Completed. And sometimes (but seem related to problems with my cable): May 23 04:32:49 rafale kernel: (scsi0:A:4:0): parity error detected in Status phase. SEQADDR(0xad) SCSIRATE(0x0) May 23 05:13:03 rafale kernel: (scsi0:A:4:0): parity error detected in Status phase. SEQADDR(0xac) SCSIRATE(0x0) Yes, this looks like a cable probrlem. There is no scsi bus freeze, and the device work perfectly without generating other errors. DV problem ? Bad hal daemon interaction ? Defect in the driver trigged by bad hal daemon behavior ? Ach, yes, it could at least be triggered by hal. Not all devices like to be polled by hal, especially if they're in a middle of an operation. CD-RW eg. Kay claimed to have it solved, but I still end up disabling hal :-) Last thing, a now two years problem: cdrwtools -d /dev/sr0 -q still instantly crash the scsibus/cdwriter and the driver never recover. I did not have a new log because of the complete bus crash. Have you new ideas about this problem ?? No, not yet. But it looks as if I finally got some time to look deeper in this problem. Bugzilla's still assigned to me, to it's a constant remainder that something's amiss ... I will try: - to get a log on a usb key - to port patch from Bugzilla Bug 5921 to current kernel. With the previous ones, the driver recover. (but i was experiencing FS corruption but it seems it was not related). - to identify exactly what cdrwtools send to the kernel/driver which cause the crash. What you should do here is: - hook up a serial cable and re-route console messages to that - Switch off syslog (as this might block if the SCSI bus frozen) - Enable scsi debugging (Error, Timeout, Scan, and Midlayer is sufficient) and start cdrwtools. - Send me the log from the serial console. This will give me at least a starting point what's going wrong. Thanks for your patience. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage [EMAIL PROTECTED] +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) --- Créez votre adresse électronique [EMAIL PROTECTED] 1
[AIC7xxx] tree things to report
Hello, After one year of rest, I resurrect my old computer, install a 2.6.21 kernel and updated my Debian distro. Tree things to repport: First, a cosmetic thing: I have two scsi sync devices and two async devices. For the first ones, domain validation return the negociated speed and mode. For the second ones, domain validation return nothing. I expect it is just a 'missing feature' but that all went ok. I am right ? scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0 Adaptec 2940 Ultra2 SCSI adapter aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs scsi 0:0:0:0: Direct-Access IBM DMVS18V 02B0 PQ: 0 ANSI: 3 scsi0:A:0:0: Tagged Queuing enabled. Depth 8 target0:0:0: Beginning Domain Validation target0:0:0: wide asynchronous target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25ns, offset 31) target0:0:0: Domain Validation skipping write tests target0:0:0: Ending Domain Validation scsi 0:0:3:0: CD-ROMYAMAHA CRW6416S 1.0d PQ: 0 ANSI: 2 target0:0:3: Beginning Domain Validation target0:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns,offset 15) target0:0:3: Domain Validation skipping write tests target0:0:3: Ending Domain Validation scsi 0:0:4:0: CD-ROMTOSHIBA CD-ROM XM-3501TA 1875 PQ: 0 ANSI: 2 target0:0:4: Beginning Domain Validation target0:0:4: Domain Validation skipping write tests target0:0:4: Ending Domain Validation scsi 0:0:6:0: Sequential-Access WANGTEK 5525ES SCSI REV7 0W PQ: 0 ANSI: 1 target0:0:6: Beginning Domain Validation target0:0:6: Ending Domain Validation Secondly, It seems that something is doing weird things with my old CD-ROM reader (XM-3501TA). At some point in time (not really regular), I get this in my logs: May 23 00:45:44 rafale kernel: (scsi0:A:4:0): No or incomplete CDB sent to device. May 23 00:45:44 rafale kernel: (scsi0:A:4:0): Protocol violation in Message-in phase. Attempting to abort. May 23 00:45:44 rafale kernel: (scsi0:A:4:0): Abort Message Sent May 23 00:45:44 rafale kernel: (scsi0:A:4:0): SCB 11 - Abort Completed. And sometimes (but seem related to problems with my cable): May 23 04:32:49 rafale kernel: (scsi0:A:4:0): parity error detected in Status phase. SEQADDR(0xad) SCSIRATE(0x0) May 23 05:13:03 rafale kernel: (scsi0:A:4:0): parity error detected in Status phase. SEQADDR(0xac) SCSIRATE(0x0) There is no scsi bus freeze, and the device work perfectly without generating other errors. DV problem ? Bad hal daemon interaction ? Defect in the driver trigged by bad hal daemon behavior ? Last thing, a now two years problem: cdrwtools -d /dev/sr0 -q still instantly crash the scsibus/cdwriter and the driver never recover. I did not have a new log because of the complete bus crash. Have you new ideas about this problem ?? I will try: - to get a log on a usb key - to port patch from Bugzilla Bug 5921 to current kernel. With the previous ones, the driver recover. (but i was experiencing FS corruption but it seems it was not related). - to identify exactly what cdrwtools send to the kernel/driver which cause the crash. If some scsi experts have a clue, I am taking. Thank you all, Best regards, Emmanuel. --- Créez votre adresse électronique [EMAIL PROTECTED] 1 Go d'espace de stockage, anti-spam et anti-virus intégrés. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html