Re: [AIC7xxx] tree things to report

2007-06-01 Thread Emmanuel Fusté
 
 Please try the attached patch and see if it helps.
 
 James, I know that the aic7xxx has some 'next_queued_hscb' pointer which
 might be utilized for this sort of thing. But I didn't really figure out
 how this thing is supposed to work nor how we could utilize it.
 So I figured that the added complexity is not really worth it.
 
Did you have time to look at the new logs this the patch applied ? Do you need 
something else ?

Cheers,
Emmanuel.
---

Créez votre adresse électronique [EMAIL PROTECTED] 
1 Go d'espace de stockage, anti-spam et anti-virus intégrés.

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AIC7xxx] tree things to report

2007-05-29 Thread Hannes Reinecke
Emmanuel Fusté wrote:
 Hello,
 
 What you should do here is:

 - hook up a serial cable and re-route console messages to that
 - Switch off syslog (as this might block if the SCSI bus frozen)
 - Enable scsi debugging (Error, Timeout, Scan, and Midlayer is
 sufficient) and start cdrwtools.
 - Send me the log from the serial console.

 Ok, I've got logs with netconsole after swapping my Ethernet
 card  with another one.

Grand. Well done, son.
The logs have been very instructive.

Again we're hitting this 'two commands per lun' problem.
For historic reasons the aic7xxx and aic79xx driver accepted two
commands per luns, as they implemented their internal queueing and could
hold the second command on the queue.
With later versions I've removed this internal queueing and relied on
the block-layer for this.
But this also means we can only accept one command per lun.

Please try the attached patch and see if it helps.

James, I know that the aic7xxx has some 'next_queued_hscb' pointer which
might be utilized for this sort of thing. But I didn't really figure out
how this thing is supposed to work nor how we could utilize it.
So I figured that the added complexity is not really worth it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
[EMAIL PROTECTED] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
Allow only a single command per lun for aic7xxx

With the conversion to use transport classes we also removed the
internal queueing from the driver. Hence the existing hack of
accepting two commands per lun and just holding this other one
internally is no longer valid.

Signed-off-by: Hannes Reinecke [EMAIL PROTECTED]

diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c 
b/drivers/scsi/aic7xxx/aic79xx_osm.c
index 6054881..df8a3b2 100644
--- a/drivers/scsi/aic7xxx/aic79xx_osm.c
+++ b/drivers/scsi/aic7xxx/aic79xx_osm.c
@@ -775,7 +775,7 @@ struct scsi_host_template aic79xx_driver
.can_queue  = AHD_MAX_QUEUE,
.this_id= -1,
.max_sectors= 8192,
-   .cmd_per_lun= 2,
+   .cmd_per_lun= 1,
.use_clustering = ENABLE_CLUSTERING,
.slave_alloc= ahd_linux_slave_alloc,
.slave_configure= ahd_linux_slave_configure,
diff --git a/drivers/scsi/aic7xxx/aic7xxx_osm.c 
b/drivers/scsi/aic7xxx/aic7xxx_osm.c
index 660f26e..e6b87b9 100644
--- a/drivers/scsi/aic7xxx/aic7xxx_osm.c
+++ b/drivers/scsi/aic7xxx/aic7xxx_osm.c
@@ -755,7 +755,7 @@ struct scsi_host_template aic7xxx_driver
.can_queue  = AHC_MAX_QUEUE,
.this_id= -1,
.max_sectors= 8192,
-   .cmd_per_lun= 2,
+   .cmd_per_lun= 1,
.use_clustering = ENABLE_CLUSTERING,
.slave_alloc= ahc_linux_slave_alloc,
.slave_configure= ahc_linux_slave_configure,


Re: [AIC7xxx] tree things to report

2007-05-29 Thread Emmanuel Fusté
 Grand. Well done, son.
 The logs have been very instructive.

 Again we're hitting this 'two commands per lun' problem.
 For historic reasons the aic7xxx and aic79xx driver accepted two
 commands per luns, as they implemented their internal
queueing and could
 hold the second command on the queue.
 With later versions I've removed this internal queueing and
relied on
 the block-layer for this.
 But this also means we can only accept one command per lun.

 Please try the attached patch and see if it helps.

 James, I know that the aic7xxx has some 'next_queued_hscb'
pointer which
 might be utilized for this sort of thing. But I didn't
really figure out
 how this thing is supposed to work nor how we could utilize it.
 So I figured that the added complexity is not really worth it.

Hi,
Some news: I tried the patch and I still get this sort of
instant bus freeze with difficult recovery.
But there is some interesting new things too:

First log: standard boot, netconsole start, echo 32767 
/proc/sys/dev/scsi/scsi ; cdrwtool -t4 -d/dev/sr0 -q
=== scsi bus crash - lot of log - Kernel Bug.

Second log: standard boot , init 1 to go into single user
mode, echo 32767  /proc/sys/dev/scsi/scsi ; cdrwtool -t4
-d/dev/sr0 -q
Bus crash, recovery, cdrwtool command crash, get the shell back.
Remount root-fs read-only to suppress completely sd 0:0:0:0
activity.
cdrwtool -t4 -d/dev/sr0 -q
Lots of recovery logs and  blablabla your cd will be
completely erased blablablapress y to continue !!!
Y enter - writer led start to blink, formating is running
But ~30s later, driver recovery or scsi timeout or midlayer
timeout (I don't know) is kicking, device is reseted, stopping
the disc formating. sniff.
cdrwtool report udf filesystem structure initialization but
all is discarded by the driver or the midlayer. cdrwtool exit.

last log: without reboot: cdrwtool -t4 -d/dev/sr0 -q
bus freeze, lots of log, cdrwtool command crash ...

All that I could say with my limited understanding of the big
picture and what I previously saw:
- The aic7xxx recovery path is still very fragile and unable
to recover from problems under scsi bus activity. Perhaps the
port of your previous work on this path from aic79xx could help.
- It seems that the commands send by cdrwtool which confuse the
driver are commands to sense the properties of the inserted
media and not the formating command itself.
- The first part: commands which crash the scsi bus before the
begin of the media format was not happening before the big
driver change (before 2.6.14).
- The second part: formating interrupted because driver
recovery kicked in was already happening with 2.6.13 and the
recovery already fail and never recover
- Perhaps to worsen all of this, cdrwtool is very crude with
my old yamaha cdwriter and help to trigger a chain of worst
case events which expose lots of bugs/unhanded cases.
(cdrwtool bugs/writer firmware bugs/aic7xxx bugs .).

Hope this could help.
The positive thing is that now with the help of Francois
Romieu I could use my old pcnet32 card to get the logs ;-)

Best regards,
Emmanuel.
---

Créez votre adresse électronique [EMAIL PROTECTED]
1 Go d'espace de stockage, anti-spam et anti-virus intégrés.


log.rafale-new-1.gz
Description: application/gzip


log.rafale-new-2.gz
Description: application/gzip


log.rafale-new-3.gz
Description: application/gzip


Re: [AIC7xxx] tree things to report

2007-05-24 Thread Emmanuel Fusté
Hello,
While trying to obtain scsi log to debug a driver problem, I
tried to use netconsole on an old smp system with a 10Mbits
pcnet32 ethernet device.
But few seconds after enabling netconsole, before launching my
scsi test, but after a few console activity the computer
freeze hard.
Is it a know or expected problem ? (2.6.21 kernel, pcnet32:
PCnet/PCI 79C970) Have you some solutions or patch to try ?
Will get back my soldering iron to do a serial cable for now.

Thanks,
Emmanuel.

 Hi Emmanuel,
 
 Emmanuel Fusté wrote:
  Hello,
  
  After one year of rest, I resurrect my old computer, install a
  2.6.21 kernel and updated my Debian distro.
  
  Tree things to repport:
  
  First, a cosmetic thing: I have two scsi sync devices and two
  async devices. For the first ones, domain validation return
  the negociated speed and mode. For the second ones, domain
  validation return nothing. I expect it is just a 'missing
  feature' but that all went ok. I am right ?
  
  scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
  Adaptec 2940 Ultra2 SCSI adapter
  aic7890/91: Ultra2 Wide Channel A, SCSI Id=7,
32/253 SCBs
  
  scsi 0:0:0:0: Direct-Access IBM  DMVS18V 02B0
  PQ: 0 ANSI: 3
  scsi0:A:0:0: Tagged Queuing enabled.  Depth 8 
   target0:0:0: Beginning Domain Validation
   target0:0:0: wide asynchronous
   target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25ns, offset 31)
   target0:0:0: Domain Validation skipping write tests
   target0:0:0: Ending Domain Validation
  scsi 0:0:3:0: CD-ROMYAMAHA   CRW6416S 1.0d
  PQ: 0 ANSI: 2
   target0:0:3: Beginning Domain Validation
   target0:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns,offset 15)
   target0:0:3: Domain Validation skipping write tests
   target0:0:3: Ending Domain Validation
  scsi 0:0:4:0: CD-ROMTOSHIBA  CD-ROM XM-3501TA 1875
  PQ: 0 ANSI: 2
   target0:0:4: Beginning Domain Validation
   target0:0:4: Domain Validation skipping write tests
   target0:0:4: Ending Domain Validation
  scsi 0:0:6:0: Sequential-Access WANGTEK  5525ES SCSI REV7 0W 
   PQ: 0 ANSI: 1
   target0:0:6: Beginning Domain Validation
   target0:0:6: Ending Domain Validation
  
 Hmm. Have to have a look at it. It should at least report
the result ...
 
  Secondly, It seems that something is doing weird things with
  my old CD-ROM reader (XM-3501TA). At some point in time (not
  really regular), I get this in my logs:
  May 23 00:45:44 rafale kernel: (scsi0:A:4:0): No or incomplete
  CDB sent to device.
  May 23 00:45:44 rafale kernel: (scsi0:A:4:0): Protocol
  violation in Message-in phase.  Attempting to abort.
  May 23 00:45:44 rafale kernel: (scsi0:A:4:0): Abort
Message Sent
  May 23 00:45:44 rafale kernel: (scsi0:A:4:0): SCB 11 - Abort
  Completed.
  And sometimes (but seem related to problems with my cable):
  May 23 04:32:49 rafale kernel: (scsi0:A:4:0): parity error
  detected in Status phase. SEQADDR(0xad) SCSIRATE(0x0)
  May 23 05:13:03 rafale kernel: (scsi0:A:4:0): parity error
  detected in Status phase. SEQADDR(0xac) SCSIRATE(0x0)
  
 Yes, this looks like a cable probrlem.
 
  There is no scsi bus freeze, and the device work perfectly
  without generating other errors. DV problem ? Bad hal daemon
  interaction ? Defect in the driver trigged by bad hal daemon
  behavior ? 
  
 Ach, yes, it could at least be triggered by hal. Not all
devices like to
 be polled by hal, especially if they're in a middle of an
operation.
 CD-RW eg. Kay claimed to have it solved, but I still end up
disabling
 hal :-)
 
  Last thing, a now two years problem:
  cdrwtools -d /dev/sr0 -q still instantly crash the
  scsibus/cdwriter and the driver never recover.
  I did not have a new log because of the complete bus crash.
  Have you new ideas about this problem ??
 No, not yet. But it looks as if I finally got some time to
look deeper
 in this problem.
 Bugzilla's still assigned to me, to it's a constant
remainder that
 something's amiss ...
 
  I will try:
  - to get a log on a usb key
  - to port patch from Bugzilla Bug 5921 to current kernel. With
  the previous ones, the driver recover. (but i was experiencing
  FS corruption but it seems it was not related).
  - to identify exactly what cdrwtools send to the kernel/driver
  which cause the crash.
 What you should do here is:
 
 - hook up a serial cable and re-route console messages to that
 - Switch off syslog (as this might block if the SCSI bus frozen)
 - Enable scsi debugging (Error, Timeout, Scan, and Midlayer is
 sufficient) and start cdrwtools.
 - Send me the log from the serial console.
 
 This will give me at least a starting point what's going wrong.
 
 Thanks for your patience.
 
 Cheers,
 
 Hannes
 -- 
 Dr. Hannes Reinecke zSeries  Storage
 [EMAIL PROTECTED]   +49 911 74053 688
 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
 GF: Markus Rex, HRB 16746 (AG Nürnberg)
 

---

Créez votre adresse électronique [EMAIL PROTECTED] 
1 

[AIC7xxx] tree things to report

2007-05-23 Thread Emmanuel Fusté
Hello,

After one year of rest, I resurrect my old computer, install a
2.6.21 kernel and updated my Debian distro.

Tree things to repport:

First, a cosmetic thing: I have two scsi sync devices and two
async devices. For the first ones, domain validation return
the negociated speed and mode. For the second ones, domain
validation return nothing. I expect it is just a 'missing
feature' but that all went ok. I am right ?

scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
Adaptec 2940 Ultra2 SCSI adapter
aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs

scsi 0:0:0:0: Direct-Access IBM  DMVS18V 02B0
PQ: 0 ANSI: 3
scsi0:A:0:0: Tagged Queuing enabled.  Depth 8 
 target0:0:0: Beginning Domain Validation
 target0:0:0: wide asynchronous
 target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25ns, offset 31)
 target0:0:0: Domain Validation skipping write tests
 target0:0:0: Ending Domain Validation
scsi 0:0:3:0: CD-ROMYAMAHA   CRW6416S 1.0d
PQ: 0 ANSI: 2
 target0:0:3: Beginning Domain Validation
 target0:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns,offset 15)
 target0:0:3: Domain Validation skipping write tests
 target0:0:3: Ending Domain Validation
scsi 0:0:4:0: CD-ROMTOSHIBA  CD-ROM XM-3501TA 1875
PQ: 0 ANSI: 2
 target0:0:4: Beginning Domain Validation
 target0:0:4: Domain Validation skipping write tests
 target0:0:4: Ending Domain Validation
scsi 0:0:6:0: Sequential-Access WANGTEK  5525ES SCSI REV7 0W 
 PQ: 0 ANSI: 1
 target0:0:6: Beginning Domain Validation
 target0:0:6: Ending Domain Validation

Secondly, It seems that something is doing weird things with
my old CD-ROM reader (XM-3501TA). At some point in time (not
really regular), I get this in my logs:
May 23 00:45:44 rafale kernel: (scsi0:A:4:0): No or incomplete
CDB sent to device.
May 23 00:45:44 rafale kernel: (scsi0:A:4:0): Protocol
violation in Message-in phase.  Attempting to abort.
May 23 00:45:44 rafale kernel: (scsi0:A:4:0): Abort Message Sent
May 23 00:45:44 rafale kernel: (scsi0:A:4:0): SCB 11 - Abort
Completed.
And sometimes (but seem related to problems with my cable):
May 23 04:32:49 rafale kernel: (scsi0:A:4:0): parity error
detected in Status phase. SEQADDR(0xad) SCSIRATE(0x0)
May 23 05:13:03 rafale kernel: (scsi0:A:4:0): parity error
detected in Status phase. SEQADDR(0xac) SCSIRATE(0x0)

There is no scsi bus freeze, and the device work perfectly
without generating other errors. DV problem ? Bad hal daemon
interaction ? Defect in the driver trigged by bad hal daemon
behavior ? 

Last thing, a now two years problem:
cdrwtools -d /dev/sr0 -q still instantly crash the
scsibus/cdwriter and the driver never recover.
I did not have a new log because of the complete bus crash.
Have you new ideas about this problem ??
I will try:
- to get a log on a usb key
- to port patch from Bugzilla Bug 5921 to current kernel. With
the previous ones, the driver recover. (but i was experiencing
FS corruption but it seems it was not related).
- to identify exactly what cdrwtools send to the kernel/driver
which cause the crash.
If some scsi experts have a clue, I am taking.

Thank you all,
Best regards,
Emmanuel.
 
---

Créez votre adresse électronique [EMAIL PROTECTED] 
1 Go d'espace de stockage, anti-spam et anti-virus intégrés.

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html