from:"Douglas Gilbert"

Re: cdparanoia not setting count and/or reply_len properly

2007-07-08 Thread Douglas Gilbert

Stefan Richter wrote:
 DervishD wrote at lkml:
 Hi all :)

 I know, this has been treated on the list before (year 2005) but
 without any real solution I'm aware of.

 I'm running kernel 2.6.20.14, and I have an ATAPI DVD writer that I
 use with an IDE-to-USB adapter, so it appears as an SCSI drive to the
 kernel.

 Anytime I rip anything with it, the log fills with the same message:
 some numbers about a certain number of bytes and the old friend message
 that I've put in the subject.

 I assume that the warning makes sense, but the fact is that my log
 is full with the same message, the ripping is correct (so cdparanoia is
 working OK WRT ripping) and if weren't for the printk_ratelimit, the
 system will freeze.

 I don't know if cdparanoia should be fixed, but certainly the
 warning could be issued only if CONFIG_SCSI_VERBOSE is set. This way you
 will have the message if something goes wrong and you want more info,
 but in cases where the warning is harmless your log will be clean...

 Anyway, this message is not for make suggestions, but for asking for
 information: why is this warning happening? naugthy cdparanoia? naughty
 kernel? I'm a bit confused and I want to use my external DVD drive for
 ripping from time to time, to exercise it...

 Thanks a lot in advance :)

 Raúl Núñez de Arenas Coronado

 
 This question is better asked at lsml.  (Therefore I'm quoting in full.)

In Fedora 7 I see this:

# cdparanoia --version
cdparanoia III release 9.8 (March 23, 2001)
(C) 2001 Monty [EMAIL PROTECTED] and Xiphophorus

Report bugs to [EMAIL PROTECTED]
http://www.xiph.org/paranoia/


So, given that date, lk 2.4.2 was out but it was probably
a bit early to start using the sg version 3 interface
which first appeared in lk 2.4.1 . So that lets annoy
the user message was added by someone who got burnt by
the old sg version 2 interface and decided people needed
to be warned. The warning comes from this code is sg.c :

/*
 * SG_DXFER_TO_FROM_DEV is functionally equivalent to SG_DXFER_FROM_DEV,
 * but is is possible that the app intended SG_DXFER_TO_DEV, because the
re
 * is a non-zero input_size, so emit a warning.
 */
if (hp-dxfer_direction == SG_DXFER_TO_FROM_DEV)
if (printk_ratelimit())
printk(KERN_WARNING
   sg_write: data in/out %d/%d bytes for SCSI comma
nd 0x%x--
   guessing data in;\n KERN_WARNING
   program %s not setting count and/or reply_len pr
operly\n,
   old_hdr.reply_len - (int)SZ_SG_HEADER,
   input_size, (unsigned int) cmnd[0],
   current-comm);

That code wasn't written be me and I would gladly remove it.
For anyone who has read the sg driver documentation,
SG_DXFER_TO_FROM_DEV implies a _read_ from the device. The
reason SG_DXFER_TO_FROM_DEV exists is for backward
compatibility to the sg version 1 interface. It was a hack to
get around the fact that the SCSI subsystem didn't report short
reads (what folks should use 'resid' for) back in those days.


It is probably about time that cdparanoia was updated ...

Doug Gilbert




-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/3] clean gendisk out of scsi ULD structs

2007-07-06 Thread Douglas Gilbert

James Bottomley wrote:
 On Thu, 2007-07-05 at 14:06 -0700, Kristen Carlson Accardi wrote:
 Since gendisk will now become part of struct scsi_device, we don't need
 to store this value in any private data structs where they already store
 scsi_device.  This series cleans up a few drivers which did this.
 
 Actually, as Al pointed out, we do have lifetime rules issues with doing
 this.  The problem is that gendisk itself always has a shorter lifetime
 than scsi_device (not much shorter, usually, but if you execute a legal
 ULD unbind manoeuvre you'll end up with a dangling gendisk pointer).

What about having short-lived scsi_device objects? For example:
one that lives long enough for a pass-through to send a
SCSI command (and receive its response) to one of a target's
well known logical units.

 The other problem with taking gendisk out of the ULD structure and
 putting it into the scsi_device is that for the sg driver, we have two
 of them (one for the attached ULD and one for the sg driver).

Add the bsg driver and that would make three of them. Or; if
the lu's peripheral device type was not of interest to sd, st,
sr, and osst; back to two gendisk objects (i.e. one each
for sg and bsg).

 The fundamental issue seems to be that the gendisk is the holder of all
 the other info (queue, ULD etc) not vice versa ... and this patch is
 trying to reverse that relationship.

A minor issue is the name gendisk ... unless, of course,
you go and look at its definition in linux/genhd.h in
which case the name looks somewhat appropriate. It looks
like a mess [queue, ULD name, major/minor(s), partitions,
capacity, disk_stats, kobjects, etc]. That is a considerable
amount of superfluous information for just a tag for
requests coming into (a) given queue when that queue leads
to a non-block device.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/3] clean gendisk out of scsi ULD structs

2007-07-05 Thread Douglas Gilbert

Kristen Carlson Accardi wrote:
 Since gendisk will now become part of struct scsi_device, we don't need
 to store this value in any private data structs where they already store
 scsi_device.  This series cleans up a few drivers which did this.

Since a scsi_device object is usually a SCSI logical unit,
one wonders why it would contain a gendisk object. Logical
units aren't necessarily disks, they might be enclosures or
just place holders that respond to an INQUIRY (e.g. lun=0
when the enclosing target has other active lus whose lun!=0).

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Low-level reformat with different sector size: ?

2007-06-29 Thread Douglas Gilbert

Matthias Urlichs wrote:
 Hello,
 
 Yesterday I managed to buy a couple of SCA disks with a sector size of ...
 *drumroll* ... 524.
 
 What's the easiest way to re-format these to use 512 bytes?
 Preferably without screwing up anything else on these things?

Umm, I hope you don't consider losing all the previous data
on the disks when a re-format is performed as screwing up?

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Low-level reformat with different sector size: ?

2007-06-29 Thread Douglas Gilbert

James Bottomley wrote:
 On Fri, 2007-06-29 at 15:34 +, Matthias Urlichs wrote:
 Yesterday I managed to buy a couple of SCA disks with a sector size of ...
 *drumroll* ... 524.

 What's the easiest way to re-format these to use 512 bytes?
 Preferably without screwing up anything else on these things?
 
 We use this program go reformat 520 sector size disks back to 512:
 
 http://parisc-linux.org/~jejb/blk512-linux.c
 
 It should work for your device as well.  Beware it requires a complete
 low level format to achieve this, which can take a very long time.

I might mention at this point that sg_format is derived
from blk512-linux.c . Both should be able to format recent
SCSI disks (e.g. manufactured in this millennium). sformat
is an older program. All of them invoke the SCSI FORMAT
command. If the SCSI FORMAT command is examined in SCSI-2,
SBC, SBC-2 and SBC-3 then it can be seen as quite complex.
Over the 15 year period spanned by those standards (SBC-3 is
still a draft) it has become more complex and changed
somewhat.

The first terabyte SCSI disk was announced this week. I
wonder how long it takes to format.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Very slow writes with mptsas

2007-06-20 Thread Douglas Gilbert

[EMAIL PROTECTED] wrote:
 On Tue, 05 Jun 2007, [EMAIL PROTECTED] wrote:
 
 Hello

 I'm seeing very slow writes on a Dell Precision 690 with the Dell SAS5
 adapter, serving a RAID1 array of SATA-II disks.

 It's very similar to the problem in FreeBSD, described here:

 http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2007-03/msg00756.html

 I'm running FC6 with the latest kernel.

 Reads are quite fast, writes terribly slow.

 
 Thanks to all who replied to this query, especially the very detailed
 response from Eric Moore at LSI.
 
 The first important facet is that we need to operate on the two hidden
 physical disks, not the RAID device.  lsscsi differentiates them:
 
 # lsscsi
 [0:0:0:0]diskATA  WDC WD5000KS-75M 2E08  -   
 [0:0:1:0]diskATA  HDS725050KLA360  AB5A  -   
 [0:1:0:0]diskDell VIRTUAL DISK 1028  /dev/sda
 
 sg_map gives the generic device numbers:

Using 'lsscsi -g' would also give you the generic device
numbers.

It is interesting that the above ATA disks do not have
corresponding /dev/sd* device names.

 # sg_map -i -x
 /dev/sg0  0 0 0 0  0  ATA   WDC WD5000KS-75M  2E08
 /dev/sg1  0 0 1 0  0  ATA   HDS725050KLA360   AB5A
 
 The write cache can then be enabled using sdparm:
 
 sdparm -s WCE=1 -S /dev/sg0
 
 and the result checked with
 
 # sdparm -g WCE /dev/sg1
 /dev/sg1: ATA   HDS725050KLA360   AB5A
 WCE 1  [cha: y]
 
 This seems to make the write performance much better.

Good.

 The question for Dell is why their version of the BIOS doesn't set the
 write cache in the first place or allow it to be altered by the user.

The mechanism for doing this was only formalized recently
with the SAT standard, so it may take a while for BIOSes
and other infrastructure to catch up.

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: doubts about sg driver

2007-06-14 Thread Douglas Gilbert

Parav Pandit wrote:
 Hi,
 
 Few basic questions on sg driver:
 
 1. Are there any hooks that low level HBA driver needs
 to implement - for providing support for SG (SCSI
 generic) driver?
 Or SG always interacts with scsi_mod and it is
 transparent to the HBA drivers?
From the tldp How-to and sg.c it looks like it doesn't
 directly talk with Low level HBA driver, but want to
 confirm.

The sg driver talks to the scsi mid level (and the
block layer strangely enough) but not directly to
LLDs.

 2. Can applications talk with SCSI RAID controller
 device (some targets exposes LUN-0 as controller) 
 through sg interface or it is only for storage
 devices?

The sg driver is useful for any SCSI device (logical
unit) that is exposed by the scsi mid level. Apart
from direct access (i.e. disk) devices that might
include cd/dvd drives, tape drives, scsi enclosures,
saf-te controllers (which have processor peripheral
device type) and well known logical units.

 3. How is the mapping between /dev/sda /dev/sdb etc to
 /dev/sg0 /dev/sg1 etc?
 Is this information is accessible via procfs or sysfs
 interface?

In the lk 2.6 series the mapping can be found in sysfs
(see lsscsi, specifically 'lsscsi -g'). In the sg3_utils
package the sg_map utility shows the mapping. That may
be helpful in the lk 2.4 series since there is no sysfs.

Doug Gilbert



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: scsi_debug fault-injection question

2007-06-11 Thread Douglas Gilbert

Randy Dunlap wrote:
 Hi Doug,
 
 scsi_debug.c says:
 
 MODULE_PARM_DESC(every_nth, timeout every nth command(def=100));
 
 I don't see where the default of 100 is set.
 
 #define DEF_EVERY_NTH   0
 ...
 static int scsi_debug_every_nth = DEF_EVERY_NTH;
 
 
 Can you clarify for me, please?

Randy,
s/100/0/

The string in MODULE_PARM_DESC is wrong. The support
web page, http://www.torque.net/sg/sdebug26.html
is accurate stating the default is 0 and notes:
for error injection: 0 - off.

Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] scsi_debug: correct parameter default text

2007-06-11 Thread Douglas Gilbert

Randy Dunlap wrote:
 From: Randy Dunlap [EMAIL PROTECTED]
 
 Correct the module info text for the default value of
 every_nth to 0.
 
 Signed-off-by: Randy Dunlap [EMAIL PROTECTED]
Signed-off-by: Douglas Gilbert [EMAIL PROTECTED]

Doug Gilbert

 ---
  drivers/scsi/scsi_debug.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 --- linux-2622-rc4.orig/drivers/scsi/scsi_debug.c
 +++ linux-2622-rc4/drivers/scsi/scsi_debug.c
 @@ -2405,7 +2405,7 @@ MODULE_PARM_DESC(add_host, 0..127 hosts
  MODULE_PARM_DESC(delay, # of jiffies to delay response(def=1));
  MODULE_PARM_DESC(dev_size_mb, size in MB of ram shared by devs(def=8));
  MODULE_PARM_DESC(dsense, use descriptor sense format(def=0 - fixed));
 -MODULE_PARM_DESC(every_nth, timeout every nth command(def=100));
 +MODULE_PARM_DESC(every_nth, timeout every nth command(def=0));
  MODULE_PARM_DESC(fake_rw, fake reads/writes instead of copying (def=0));
  MODULE_PARM_DESC(max_luns, number of LUNs per target to simulate(def=1));
  MODULE_PARM_DESC(no_lun_0, no LU number 0 (def=0 - have lun 0));
 

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: MEDIUM FORMAT CORRUPTED error

2007-06-08 Thread Douglas Gilbert

sandip shete wrote:
 Hi,
   I recieved the following ASC/ASCQ values as Additional Sense data :
 31/00.
   I looked it up and found that it stands for MEDIUM FORMAT CORRUPTED.
 Does it mean that the target disk has bad sectors?

That error may be reported after a disk is reset during a
FORMAT operation. Another related case is a media access
after a MODE SELECT is used to change the sector size (e.g.
from 512 to 528 bytes) and prior to a FORMAT command which
actually reformats the disk to 528 byte sectors.

So if a disk is reporting that ASC/ASCQ sequence for ever
media access, then you need to format it. In that case
sg_format in sg3_utils may be useful.

Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Very slow writes with mptsas

2007-06-06 Thread Douglas Gilbert

Matthew Jacob wrote:
 The FreeBSD problem was fixed by Scott Long a couple of days ago by
 doing some cut through SAS stuff that enabled Write Cache for SATA
 drives. Why LSI-Logic couldn't just blitheringly synthesize mode page
 8 is beyond me, but okay.
 
 I dunno whether the issue here is the same one Scott tackled- probably
 given the messages. Eric- you listening in on this?

Matt,
Just in case Eric doesn't answer, I suspect if the
HBA firmware can be upgraded (from Dell or LSI?) then
WCE (write cache enable) in the caching mode page
will be supported. It is one of the few mode page
settings that is required to be implemented in SAT.

The other field that should be changeable is DRA
(disable read ahead). Both work on my LSI SAS HBA.

Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SMART support for SATA drives in SAS enclosures

2007-05-23 Thread Douglas Gilbert

Pim Zandbergen wrote:
 Is SMART support available for SATA drives in SAS enclosures?
 
 I'm testing this setup
 
 LSI Logic SAS3800X PCI-X SAS controller (mptsas driver)
 Promise V-Trak J300S SAS/SATA enclosure/expander
 12x Seagate ST3500630NS
 Linux kernel 2.6.21.1 x86_64
 smartmontools-5.37-1.1.fc6 from Fedora Core 6
 
 smartctl -i -d sat /dev/sdc gives me
 
 Smartctl: Device Read Identity Failed

I presume /dev/sdc is an actual disk rather than a
RAID device made up of several disks. The SAT standard
(and smartmontools) don't have a general way of
addressing individual disks behind RAID infrastructure.

For recent versions of smartmontools version 5.37 and
MPT Fusion SAS HBAs this should work if /dev/sdc is
a SATA disk. Your HBA may need a firmware upgrade.


You might fetch sg3_utils version 1.24 and try:
  sg_sat_identify /dev/sdc
That needs to work before smartctl has a hope.

 Same with -d ata.
 
 If I treat the disk as SCSI (-d scsi), the command
 will not fail, but wil only retrieve the serial number.

With MPT Fusion SAS hardware (that I have seen) the SAT
layer is in the HBA firmware. Only later versions of the
firmware support the SCSI ATA PASS-THROUGH command.

Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Sg_ses question

2007-05-22 Thread Douglas Gilbert

Haefliger, Juerg wrote:
 Hi,
 
 Not sure if this is the right list for my question but I couldn't find a
 more suitable place to ask it.
 
 I'm trying to set the locator light of a disk in a SAS enclosure using
 sg_ses but I'm not getting anywhere. I'm dumping the enclosure status
 diagnostic page using 'sg_ses --page=2 --raw /dev/sgXX  page' and then
 set the SELECT and RQST IDENT bits of the array device element in
 question and write it back doing 'sg_ses --control --page=2 --data=-
 /dev/sgXX  tmp'. The command completes without error but unfortunately,
 nothing happens. When I read the page back, the IDENT bit is still
 cleared and the light on the enclosure remains turned off.
 
 Am I doing something wrong or am I missing something? Can't I use sg_ses
 to achieve this?

The procedure looks correct. I haven't had any (other)
reports of sg_ses not working lately. The only suggestion
I can make is to ask if you have selected the element
control rather than the overall control.

Doug Gilbert




-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

hdparm 7.3 supports SAT

2007-05-10 Thread Douglas Gilbert

Mark Lord's hdparm version 7.3 supports the SCSI to ATA
Translation (SAT) pass-through. So if SAT is supported,
this allows hdparm to access ATA (e.g. SATA disks) and
ATAPI (e.g. cd/dvd drives) devices behind SCSI transports.

Note that the SAT layer may be in the kernel (e.g. libata),
in a HBA's firmware (MPT Fusion SAS HBAs) or external.
Also one of those SCSI transports might be SATA.

See http://sourceforge.net/projects/hdparm/


Both sdparm and two utilities in sg3_utils used some
tortured syntax to pipe through ATA IDENTIFY (PACKET)
DEVICE responses to 'hdparm --Istdin' prior to hdparm 7.x .
That, in most cases, should no longer be needed.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] bidi support: bidirectional request

2007-04-30 Thread Douglas Gilbert

Jens Axboe wrote:
 On Mon, Apr 30 2007, Benny Halevy wrote:
 Jens Axboe wrote:
 On Sun, Apr 29 2007, James Bottomley wrote:
 On Sun, 2007-04-29 at 18:48 +0300, Boaz Harrosh wrote:
 FUJITA Tomonori wrote:
 From: Boaz Harrosh [EMAIL PROTECTED]
 Subject: [PATCH 4/4] bidi support: bidirectional request
 Date: Sun, 15 Apr 2007 20:33:28 +0300

 diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
 index 645d24b..16a02ee 100644
 --- a/include/linux/blkdev.h
 +++ b/include/linux/blkdev.h
 @@ -322,6 +322,7 @@ struct request {
  void *end_io_data;

  struct request_io_part uni;
 +struct request_io_part bidi_read;
  };
 Would be more straightforward to have:

 struct request_io_part in;
 struct request_io_part out;

 Yes I wish I could do that. For bidi supporting drivers this is the most 
 logical.
 But for the 99.9% of uni-directional drivers, calling rq_uni(), and being 
 some what on
 the hotish paths, this means we will need a pointer to a uni 
 request_io_part.
 This is bad because:
 1st- There is no defined stage in a request life where to definitely set 
 that pointer,
  specially in the preparation stages.
 2nd- hacks like scsi_error.c/scsi_send_eh_cmnd() will not work at all. 
 Now this is a
  very bad spot already, and I have a short term fix for it in the 
 SCSI-bidi patches
  (not sent yet) but a more long term solution is needed. Once such 
 hacks are
  cleaned up we can do what you say. This is exactly why I use the 
 access functions
  rq_uni/rq_io/rq_in/rq_out and not open code access.
 I'm still not really convinced about this approach.  The primary job of
 the block layer is to manage and merge READ and WRITE requests.  It
 serves a beautiful secondary function of queueing for arbitrary requests
 it doesn't understand (REQ_TYPE_BLOCK_PC or REQ_TYPE_SPECIAL ... or
 indeed any non REQ_TYPE_FS).

 bidirectional requests fall into the latter category (there's nothing
 really we can do to merge them ... they're just transported by the block
 layer).  The only unusual feature is that they carry two bios.  I think
 the drivers that actually support bidirectional will be a rarity, so it
 might even be advisable to add it to the queue capability (refuse
 bidirectional requests at the top rather than perturbing all the drivers
 to process them).

 So, what about REQ_TYPE_BIDIRECTIONAL rather than REQ_BIDI?  That will
 remove it from the standard path and put it on the special command type
 path where we can process it specially.  Additionally, if you take this
 approach, you can probably simply chain the second bio through
 req-special as an additional request in the stream.  The only thing
 that would then need modification would be the dequeue of the block
 driver (it would have to dequeue both requests and prepare them) and
 that needs to be done only for drivers handling bidirectional requests.
 I agree, I'm really not crazy about shuffling the entire request setup
 around just for something as exotic as bidirection commands. How about
 just keeping it simple - have a second request linked off the first one
 for the second data phase? So keep it completely seperate, not just
 overload -special for 2nd bio list.

 So basically just add a struct request pointer, so you can do rq =
 rq-next_rq or something for the next data phase. I bet this would be a
 LOT less invasive as well, and we can get by with a few helpers to
 support it.

 And it should definitely be a request type.

 I'm a bit confused since what you both suggest is very similar to what we've
 proposed back in October 2006 and the impression we got was that it will be
 better to support bidirectional block requests natively (yet to be honest,
 James, you wanted a linked request all along).

 It still has to be implemented natively at the block layer, just
 differently like described above. So instead of messing all over the
 block layer adding rq_uni() stuff, just add that struct request pointer
 to the request structure for the 2nd data phase. You can relatively easy
 then modify the block layer helpers to support mapping and setup of such
 requests.

 Before we go on that route again, how do you see the support for bidi
 at the scsi mid-layer done?  Again, we prefer to support that officially
 using two struct scsi_cmnd_buff instances in struct scsi_cmnd and not as
 a one-off feature, using special-purpose state and logic (e.g. a linked
 struct scsi_cmd for the bidi_read sg list).

 The SCSI part is up to James, that can be done as either inside a single
 scsi command, or as linked scsi commands as well. I don't care too much
 about that bit, just the block layer parts :-). And the proposed block
 layer design can be used both ways by the scsi layer.

Linked SCSI commands have been obsolete since SPC-4 rev 6
(18 July 2006) after proposal 06-259r1 was accepted. That
proposal starts: The reasons for linked commands have been
overtaken by time and events. I haven't see anyone mourning
their demise on

Re: [PATCH 0/4] bidi support: block layer bidirectional io.

2007-04-16 Thread Douglas Gilbert

Boaz Harrosh wrote:
 Following are 4 (large) patches for support of bidirectional
 block I/O in kernel. (not including SCSI-ml or iSCSI)
 
 The submitted work is against linux-2.6-block tree as of
 2007/04/15, and will only cleanly apply in succession.
 
 The patches are based on the RFC I sent 3 months ago. They only
 cover the block layer at this point. I suggest they get included
 in Morton's tree until they reach the kernel so they can get
 compiled on all architectures/platforms. There is still a chance
 that architectures I did not compile were not fully converted.
 (FWIW, my search for use of struct request members failed to find
 them). If you find such a case, please send me the file
 name and I will fix it ASAP.
 
 Patches summary:
 1. [PATCH 1/4] bidi support: request dma_data_direction
   - Convert REQ_RW bit flag to a dma_data_direction member like in 
 SCSI-ml use.
   - removed rq_data_dir() and added other APIs for querying request's 
 direction.
   - fix usage of rq_data_dir() and peeking at req-cmd_flags  REQ_RW to 
 using
 new api.
   - clean-up bad usage of DMA_BIDIRECTIONAL and bzero of none-queue 
 requests,
 to use the new blk_rq_init_unqueued_req()
 
 2. [PATCH 2/4] bidi support: fix req-cmd == INT cases
   - Digging into all these old drivers, I have found traces of past life
 where request-cmd was the command type. This patch fixes some of 
 these
 places. All drivers touched by this patch are clear indication of 
 drivers
 that were not used for a while. Should we removed them from Kernel? 
 These Are:
   drivers/acorn/block/fd1772.c, drivers/acorn/block/mfmhd.c,
   drivers/block/nbd.c, drivers/cdrom/aztcd.c, 
 drivers/cdrom/cm206.c
   drivers/cdrom/gscd.c, drivers/cdrom/mcdx.c, 
 drivers/cdrom/optcd.c
   drivers/cdrom/sjcd.c, drivers/ide/legacy/hd.c, 
 drivers/block/amiflop.c
 
 2. [PATCH 3/4] bidi support: request_io_part
   - extract io related fields in struct request into struct 
 request_io_part
 in preparation to full bidi support.
   - new rq_uni() API to access the sub-structure. (Please read below 
 comment
 on why an API and not open code the access)
   - Convert All users to new API.
 
 3. [PATCH 4/4] bidi support: bidirectional block layer
   - add one more request_io_part member for bidi support in struct 
 request.
   - add block layer API functions for mapping and accessing bidi data 
 buffers
 and for ending a block request as a whole (end_that_request_block())
 
 
 Developer comments:
 
 patch 1/4: Borrow from struct scsi_cmnd use of enum dma_data_direction. 
 Further work (in
 progress) is the removal of the corresponding member from struct scsi_cmnd 
 and converting
 all users to directly access rq_dma_dir(sc-req).
 
 patch 3/4: The reasons for introducing the rq_uni(req) API rather than 
 directly accessing
 req-uni are:
 
 * WARN(!bidi_dir(req)) is life saving when developing bidi enabled paths.  
 Once we, bidi
   users, start to push bidi requests down the kernel paths, we immediately 
 get warned of
   paths we did not anticipate. Otherwise, they will be very hard to find, and 
 will hurt
   kernel stability.
 
 * A cleaner and saner future implementation could be in/out members rather 
 than
   uni/bidi_read.  This way the dma_direction member can deprecated and the 
 uni sub-
   structure can be maintained using a pointer in struct req.
   With this API we are free to change the implementation in the future without
   touching any users of the API. We can also experiment with what's best. 
 Also, with the
   API it is much easier to convert uni-directional drivers for bidi (look in
   ll_rw_block.c in patch 4/4).
 
 * Note, that internal uses inside the block layer access req-uni directly, 
 as they will
   need to be changed if the implementation of req-{uni, bidi_read} changes.

Boaz,
Recently I have been looking at things from the perspective
of a SAS target and thinking about bidi commands. Taking
XDWRITEREAD(10) in sbc3r09.pdf (section 5.44) as an example,
with DISABLE_WRITE=0, the device server in the target should
do the following:
  a) decode the cdb **
  b) read from storage [lba, transfer_length]
  c) fetch data_out from initiator [transfer_length] ***
  d) XOR data from (b) and (c) and place result in (z)
  e) write the data from (c) to storage [lba, transfer_length]
  f) send (z) in data_in to initiator [transfer_length]
  g) send SCSI completion status to initiator

Logically a) must occur first and g) last. The b) to f)
sequence could be repeated (perhaps) by the device server
subdividing the transfer_length (i.e. it may not be
reasonable for the OS to assume that the data_out transfer
will be complete before there is any data_in transfer).
With this command (and with most other bidi commands

[ANNOUNCE] sdparm 1.01

2007-04-09 Thread Douglas Gilbert

sdparm is a command line utility designed to get and set
SCSI device parameters (cf hdparm for ATA disks). The
parameters are held in mode pages. Apart from SCSI devices
(e.g. disks, tapes and enclosures) sdparm can be used on
any device that uses a SCSI command set. Virtually all CD/DVD
drives use the SCSI MMC set irrespective of the transport.
sdparm also can decode VPD pages including the device
identification page. Commands to start and stop the media;
load and unload removable media and some other housekeeping
functions are supported. sdparm supports both the linux
kernel 2.4 and 2.6 series with ports to FreeBSD and Windows.

ChangeLog for sdparm-1.01 [20070405]
  - add element address assignment mode page (smc)
  - improve error handling in lk 2.4 series mapping to
sg devices
  - add configure.ac rule for mingw (Windows)
- include inttypes.h to use PRIx64 instead of %llx
  - add LUICLR bit to extended inquiry VPD page
  - correct some headers for C++ inclusion
- fix some C code to compile under C++
  - fix bug when unusual transport or vendor given
  - add a Fujitsu vendor mode page
  - add initial priority to control extension mpage
  - add disconnect-reconnect mpage to generic list;
there are still transport specific versions
  - extend block limits VPD page (sbc3r09)
  - sync with sg3_utils-1.24 pass-through code

For more information and downloads see:
http://www.torque.net/sg/sdparm.html

A release announcement has been sent to freshmeat.net .

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13

2007-04-06 Thread Douglas Gilbert

James Bottomley wrote:
 On Fri, 2007-04-06 at 08:51 -0700, Andrew Burgess wrote:
 James Bottomley wrote:

 It's actually a long standing bug in the 3w- driver.  Apparently it
 assumes request sense is always the use_sg == 0 case.  This is what it
 does on a request sense:
 static int tw_scsiop_request_sense(TW_Device_Extension *tw_dev, int 
 request_id)
 {
dprintk(KERN_NOTICE 3w-: tw_scsiop_request_sense()\n);
/* For now we just zero the request buffer */
memset(tw_dev-srb[request_id]-request_buffer, 0, 
 tw_dev-srb[request_id]-request_bufflen);
tw_dev-state[request_id] = TW_S_COMPLETED;
tw_state_request_finish(tw_dev, request_id);
 
 Note that it's clearing the request buffer, which is actually zeroing the 
 scatterlist, hence the problem.
 OK. Is there a quick workaround or should I just wait for
 Adam  Company to make a patch?
 
 Try this ... I think it's roughly the correct fix.
 
 You said your earlier patch would hide it, and then said you
 had a length wrong in it and I'm not sure what length you
 mean.
 
 It's the length specifier in the error handler request sense command ...
 I'll fix it up and redo my patch through scsi-misc, since it's not going
 to fix the root cause of the problem.
 
 James
 
 diff --git a/drivers/scsi/3w-.c b/drivers/scsi/3w-.c
 index bf5d63e..6b303ba 100644
 --- a/drivers/scsi/3w-.c
 +++ b/drivers/scsi/3w-.c
 @@ -1864,10 +1864,17 @@ static int tw_scsiop_read_write(TW_Device_Extension 
 *tw_dev, int request_id)
  /* This function will handle the request sense scsi command */
  static int tw_scsiop_request_sense(TW_Device_Extension *tw_dev, int 
 request_id)
  {
 + char request_buffer[18];
 +
   dprintk(KERN_NOTICE 3w-: tw_scsiop_request_sense()\n);
  
 - /* For now we just zero the request buffer */
 - memset(tw_dev-srb[request_id]-request_buffer, 0, 
 tw_dev-srb[request_id]-request_bufflen);
 + memset(request_buffer, 0, sizeof(request_buffer));
 + request_buffer[0] = 0x70; /* Immediate fixed format */
 + request_buffer[7] = 11; /* minimum size per SPC: 18 bytes */

James,
That last line should be:
request_buffer[7] = 10; /* minimum size per SPC: 18 bytes */

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] SG: cap reserved_size values at max_sectors

2007-04-04 Thread Douglas Gilbert

Alan Stern wrote:
 This patch (as857) modifies the SG_GET_RESERVED_SIZE and
 SG_SET_RESERVED_SIZE ioctls in the sg driver, capping the values at
 the device's request_queue's max_sectors value.  This will permit
 cdrecord to obtain a legal value for the maximum transfer length,
 fixing Bugzilla #7026.
 
 The patch also caps the initial reserved_size value.  There's no
 reason to have a reserved buffer larger than max_sectors, since it
 would be impossible to use the extra space.
 
 The corresponding ioctls in the block layer are modified similarly,
 and the initial value for the reserved_size is set as large as
 possible.  This will effectively make it default to max_sectors.
 Note that the actual value is meaningless anyway, since block devices
 don't have a reserved buffer.
 
 Finally, the BLKSECTGET ioctl is added to sg, so that there will be a
 uniform way for users to determine the actual max_sectors value for
 any raw SCSI transport.
 
 Signed-off-by: Alan Stern [EMAIL PROTECTED]

Alan,
I have voiced my concerns about this earlier but I will
now sign off to unblock the process (and deal with the
consequences to sg users, if any).

Signed-off-by: Douglas Gilbert [EMAIL PROTECTED]

 
 ---
 
 Index: usb-2.6/drivers/scsi/sg.c
 ===
 --- usb-2.6.orig/drivers/scsi/sg.c
 +++ usb-2.6/drivers/scsi/sg.c
 @@ -917,6 +917,8 @@ sg_ioctl(struct inode *inode, struct fil
   return result;
  if (val  0)
  return -EINVAL;
 + val = min_t(int, val,
 + sdp-device-request_queue-max_sectors * 512);
   if (val != sfp-reserve.bufflen) {
   if (sg_res_in_use(sfp) || sfp-mmap_called)
   return -EBUSY;
 @@ -925,7 +927,8 @@ sg_ioctl(struct inode *inode, struct fil
   }
   return 0;
   case SG_GET_RESERVED_SIZE:
 - val = (int) sfp-reserve.bufflen;
 + val = min_t(int, sfp-reserve.bufflen,
 + sdp-device-request_queue-max_sectors * 512);
   return put_user(val, ip);
   case SG_SET_COMMAND_Q:
   result = get_user(val, ip);
 @@ -1061,6 +1064,9 @@ sg_ioctl(struct inode *inode, struct fil
   if (sdp-detached)
   return -ENODEV;
   return scsi_ioctl(sdp-device, cmd_in, p);
 + case BLKSECTGET:
 + return put_user(sdp-device-request_queue-max_sectors * 512,
 + ip);
   default:
   if (read_only)
   return -EPERM;  /* don't know so take safe approach */
 @@ -2339,6 +2345,7 @@ sg_add_sfp(Sg_device * sdp, int dev)
  {
   Sg_fd *sfp;
   unsigned long iflags;
 + int bufflen;
  
   sfp = kzalloc(sizeof(*sfp), GFP_ATOMIC | __GFP_NOWARN);
   if (!sfp)
 @@ -2369,7 +2376,9 @@ sg_add_sfp(Sg_device * sdp, int dev)
   if (unlikely(sg_big_buff != def_reserved_size))
   sg_big_buff = def_reserved_size;
  
 - sg_build_reserve(sfp, sg_big_buff);
 + bufflen = min_t(int, sg_big_buff,
 + sdp-device-request_queue-max_sectors * 512);
 + sg_build_reserve(sfp, bufflen);
   SCSI_LOG_TIMEOUT(3, printk(sg_add_sfp:   bufflen=%d, k_use_sg=%d\n,
  sfp-reserve.bufflen, sfp-reserve.k_use_sg));
   return sfp;
 Index: usb-2.6/block/ll_rw_blk.c
 ===
 --- usb-2.6.orig/block/ll_rw_blk.c
 +++ usb-2.6/block/ll_rw_blk.c
 @@ -1925,6 +1925,8 @@ blk_init_queue_node(request_fn_proc *rfn
   blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS);
   blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS);
  
 + q-sg_reserved_size = INT_MAX;
 +
   /*
* all done
*/
 Index: usb-2.6/block/scsi_ioctl.c
 ===
 --- usb-2.6.orig/block/scsi_ioctl.c
 +++ usb-2.6/block/scsi_ioctl.c
 @@ -78,7 +78,9 @@ static int sg_set_timeout(request_queue_
  
  static int sg_get_reserved_size(request_queue_t *q, int __user *p)
  {
 - return put_user(q-sg_reserved_size, p);
 + unsigned val = min(q-sg_reserved_size, q-max_sectors  9);
 +
 + return put_user(val, p);
  }
  
  static int sg_set_reserved_size(request_queue_t *q, int __user *p)
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux tape drivers

2007-04-04 Thread Douglas Gilbert

Kai Makisara wrote:
 On Tue, 3 Apr 2007, Andrew Morton wrote:
 
 (cc's added, with permission)

 On Tue, 3 Apr 2007 15:08:37 +0200
 Kern Sibbald [EMAIL PROTECTED] wrote:

 Hello,

 I am the project manager for Bacula, an Open Source network backup program 
 that runs on all popular OSes.  After your presentation at FOSDEM in 
 Febrary, 
 we briefly talked about Linux tape driver problems I am encountering, and 
 you 
 offered to put me in touch with the appropriate kernel developers.

 I would much appreciate any help in this.  Since the problems concern all 
 tape 
 drivers, I provide a very brief outline of what my would like to discuss.  
 First, I must mention that the Linux SCSI driver works perfectly fine with 
 Bacula, it is simply a question of possible improvements, under item 2 
 below.

 Issues for discussion:

 1. Bugs:
a. Other than the OSST driver, apparently no IDE/SATA tape driver works
with Bacula. I don't have such a drive (working on it), but from user
reports, it appears to me that there are problems of permitting 
variable length blocks, and more serious, when writing to the end of
the tape, either the logical end of tape indicator is ignored, or 
 when
it is encountered, all further I/O is prohibited -- including a 
 WEOF. 
This makes reliable writing of multiple reel tapes impossible.

By the way, these IDE/SATA drives work with Bacula using the same
source code cross-compiled with GNU C++ on Linux, then run on Windows
machines, so it is most likely a driver issue rather than anything in
Bacula or the hardware.

 
 Others have already answered this and I agree with their view. All of the 
 tape drives seem to use the SSC command set or something close to that. 
 One high-level driver should be enough to implement the user semantics.
  
 Libata should be able to drive the SATA/IDE drives using and the drives 
 are visible as SCSI devices in Linux. In future there should be no real 
 need for ide-scsi. Probably very few people have tried libata with tapes 
 and there may be some problems to fix. Someone should test this with 
 real devices and report the problems back to libata maintainers.
 
 2. Usability of the current tape driver API (not bugs)
   a. With the new O_NONBLOCK flag introduced in kernel 2.5.x, opening
   a tape drive and finding out if a volume is mounted is much more 
   complicated.  It is really inconvenient and required a lot more code
   in prior kernels.  This should be an item for discussion.
 
 The reasons for the change were:
 1. To be compatible with the Unix standards, and
 2. To be compatible with other Unix tape driver semantics.
 
 Because of these reasons the changes should probably not be reversed but 
 there may be something to improve in the implementation. Suggestions?

Kai,
Perhaps an ignore_nonblock sysfs attribute or driver option
could be added for the old semantics.
As I have found in the past, programs the scan for devices
by opening device nodes don't play well with drivers
that hang on open.

   b. There is no simple way to determine if a tape is in a drive -- it is 
   at least 20 or 30 lines of C code to do it right.
 
 Why not use GMT_ONLINE() with MTIOCGET? The definition from the st man 
 page is:
 
 GMT_ONLINE(x): The last open() found the drive with a tape in place and 
 ready for operation.
 
 If it does not work correctly, it can be fixed. (Of course, if you want to 
 see if a tape is in a drive but not loaded, it is more difficult.)

Sound like a TEST UNIT READY is all that is needed.
They could call out to a utility like sg_turs or
sdparm and check the exit status. They could also
build with sg3_utils-libs and call
sg_ll_test_unit_ready(). [All sg3_utils code is C++
friendly.]

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

sg_v4 interface, release 1.3

2007-04-04 Thread Douglas Gilbert

Attached is the SCSI generic version 4 interface, release
1.3

ChangeLog for release 1.3 [20070404]
  - increase tag size to 64 bits to comply with SAM-4 and SRP
  - add request_extra and spare_out2 for alignment

Doug Gilbert


  SCSI Generic version 4 interface structure
  ==
  Release 1.3

Goals:
  - handle both generalized request/response and data_out/data_in
independently in same invocation (i.e. synchronous usage).
  - alternatively the request and data_out could be instigated in one
invocation with pointers given for the incoming response and data_in.
Then a second invocation (as a result of polling or asynchronous
notification) reports the response and/or data_in is done, plus
provides error/resid/timing information. This is asynchronous usage.
This allows for the most complicated SCSI commands: tagged, variable
length cdbs with bidirectional data transfers.
  - support multiple protocols. If they are generalized request-response
protocols then they can choose either the request/response part of the
interface or the data_out/data_in part.
  - layered error/condition reporting: (OS) driver, transport and device
(logical unit). Method used to present this struct to OS (e.g. ioctl())
may also report error (e.g. EPERM).
  - allow for auxiliary information to be passed back for the application
client to consider
  - same structure can be used for a synchronous (e.g. interruptible ioctl)
or asynchronous (e.g. ioctl()/read() ) pass through.
  - leave device (lu) or target addressing issues to some other mechanism
(what SCSI standards call the I_T_L or the I_T nexus respectively) as
they are transport dependent. However do include the tag level (the
_Q part of a I_T_L_Q nexus).
  - stay close enough to struct sg_io_hdr (sg version 3 interface) to use
with existing SG_IO ioctls, current implementations expect 'S' in
'guard'


Comments:
  - unsigned 64 bit integers used as pointer carriers to ease 32/64
bit code interworking (e.g. 32 bit app on 64 bit kernel)
  - should there be more (or less) spare fields?
  - the write() usage in the sg driver's asynchronous interface has
caused problems when mistakenly applied to a block device node
rather than a sg device node. Using an ioctl(flag_async) followed
by a read() for asynchronous work offers similar functionality and
is safer. Using ioctl(flag_async_start) and ioctl(flag_async_finish)
is another possibility.
  - rather than have a separate ATA pass through mechanism, the SAT
defined ATA PASS THROUGH SCSI commands could be used with the
driver implementation routing the ATA commands to their
subsystem. This could be flagged so it didn't preclude a SAT layer
in a SCSI transport (e.g. MPT SAS HBA firmware).
  - if SAM/SPC does not define an enumeration for lesser used input
fields, then use the value 0 for inert/off/don't_care .



ChangeLog for release 1.3 [20070404]
  - increase tag size to 64 bits to comply with SAM-4 and SRP
  - add request_extra and spare_out2 for alignment

ChangeLog for release 1.2 [20070314]
  - add dout_resid
  - re-arrange uint64_t types (i.e. pointer carriers) to be on a
8 byte boundary
  - reinstate dout_iovec_count and din_iovec_count (they were in
release 1.1 but bsg dropped them)
  - change name: response_len_wr to response_len
  - pick up some descriptions from bsg

ChangeLog for release 1.1 [20061106]
  - was called sg version 4 interface, version 1.1
so change the second version to release


---
#include stdint.h


struct sg_io_v4
{
int32_t guard;  /* [i] 'Q' to differentiate from v3 */
uint32_t protocol;  /* [i] 0 - SCSI ,  */
uint32_t subprotocol;   /* [i] 0 - SCSI command, 1 - SCSI task
   management function,  */

uint32_t request_len;   /* [i] in bytes {SCSI: cdb length} */
uint64_t request;   /* [i], [*i] {SCSI: cdb} */
uint64_t request_tag;   /* [i] {SCSI: task tag (only if flagged)} */
uint32_t request_attr;  /* [i] {SCSI: task attribute} */
uint32_t request_priority; /* [i] {SCSI: task priority} */
uint32_t request_extra; /* [i] {spare, for padding} */
uint32_t max_response_len; /* [i] in bytes */
uint64_t response;  /* [i], [*o]  {SCSI: (auto)sense data} */

/* dout_: data out (to device); din_: data in (from device) */
uint32_t dout_iovec_count;  /* [i] 0 - flat dout transfer */
/* else dout_xfer points to array of iovec */
uint32_t dout_xfer_len; /* [i] bytes to be transferred to device */
uint32_t din_iovec_count;   /* [i] 0 - flat din transfer */
uint32_t din_xfer_len;

Re: SMP pass through interface via bsg

2007-04-02 Thread Douglas Gilbert

James Smart wrote:
 
 
 James Bottomley wrote:
 -- each SAS object (host, device, expander, etc) has the own bsg
 device

 I think so; probably attached via the transport class.
 
 FYI - I understand the idea of a bsg device per object, but really, for
 something that is used rarely, it's a bunch of overhead. Objects, data
 structures, etc - more udev/kobject mgmt  I believe I prefer the
 approach of a shared distribution point - e.g. one bsg device at the
 transport globally, or perhaps one at the host (actually the outbound
 port aka host/channel) supporting the transport - followed by headers
 in the messages that direct flow after that. This kinda follows the
 model we have today for I/O - w/ queuecommand for the host, and
 addressing in the SCSI command.

James,
I fully agree.

 Additionally, I've always had some concern that we had to create an
 object for everything in the SAN (every phy!), and have that view
 replicated
 per host (for multi-initiator/multi-path SANs). I always believed there
 was some sets of things that you would want to talk to that just doesn't
 justify a new object (for example - do we start talking to process
 associators
 in FC ?).  Another reason to move toward a transport-specific addressing
 header.

Yes, seldom used things like well known logical units
and virtual SMP targets (there is one on every MPT Fusion
SAS LBA) that don't make the cut in the devices for everything
model become invisible to Linux users. It is exactly these
type of things that specialized user space programs use
a pass-through interface for.

So if the kernel can't find a use for it, then you, the
owner of the hardware, won't be able to use it either.
Hard to describe that approach as open software.

 My other concern with using bsg and the i/o path for transport management
 functions is they compete with i/o for things like the can_queue values.
 Should they ? Should they have higher priority ?

sg v4 adds priority control mechanisms but there still
remains possibilities for conflict, some of which may
cause problems.

I can see that a state based driver like st may want
to stop a pass-through getting to a logical unit most
of the time (and mechanisms could be added). However
even st may want to use a pass through to the transport
to reset the target (hard reset) if it can't get the
LU RESET task management function to work.

 I'd really rather not go this route unless the one device per object
 approach becomes untenable.
 
 Understood, but building things until they topple is not a great idea
 as there will be back-ward compatibility issues w/ user-space/sysfs and
 the tools built around it. If you start with the shared distribution
 point, you can always support both (eventually) if its a good idea.
 Harder to do that in the reverse if it's toppling.

We are talking about the SAS Management Protocol (SMP)
in this thread and in SAS-1 and SAS-1.1 discovery is done
by every SAS initiator, for every ripple in the topology.
In large topologies this approach can cause a SMP storm
that can temporarily drop SAS bandwidth to SCSI-1 figures.
Today discovery is done in the LLD or firmware (Adaptec and
LSI respectively) so they can magically make devices appear.
The approach in SAS-2 is to devolve SAS discovery to
expanders and use more efficient SMP functions. Current
generation SAS HBAs (and some LLDs) will need to alter
or stop their current SAS discovery techniques. The
user space may need to get involved, for zoning and
associated security.

Only allowing the SMP pass-through to talk to devices
that the kernel thinks are SAS expanders has some
shortcomings:
  - how can user space SAS topology discovery be done?
  - what about SMP targets that are not on expanders
  - disabling the phy that connects an expander
to the SAS domain is problematic when the
file descriptor you are using notionally represents
that expander.


Note: discovery of a SAS topology is a different process from
finding logical units within SCSI targets. In the context of
SAS, the latter process can stay in the kernel and can be
done for each SSP target found, preferably after the SAS
topology has been fully discovered.

 The patch adds a hook into sas transport class. sas_host_setup calls
 bsg_register_queue. Then, the request_fn calls smp_execute_task to
 send a smp request and get the response. It doesn't look good to link
 the sas transport class with libsas. In addition, the mpt driver
 handles smp request/response in a very different way.

 Any suggestion to bind SMP pass through via bsg to aic94xx and mpt
 cleanly?

 bind in the transport class, not the driver ...
 
 Agree - the trick for libsas is to get an interface into the driver that
 both drivers can support.

For LSI MPT Fusion it should be almost trivial to
map the host,phy_id,sas_address (Tomo's hacky approach)
through to LSI's ioc_num and SMP pass-through structures.

The aic94xx must have a similar structure. How else could
it implement a SMP

Re: aic94xx driver woes

2007-04-02 Thread Douglas Gilbert

James Bottomley wrote:
 On Sun, 2007-04-01 at 16:29 -0400, Douglas Gilbert wrote:
 ...
 sas: phy3 added to port0, phy_mask:0x8
 sas: DOING DISCOVERY on port 0, pid:2110
 aic94xx: scb:0x80 timed out
 
 This might be the problem.
 
 I see this periodically when a phy goes out to lunch on my system ...
 with me, it always seems to be phy0 of a port containing phy0-4 ... so
 phy1-3 still function to get messages.
 
 Can you try sending a link reset to phy3?
 
 It should be something like
 
 echo 1  /sys/class/sas_phy/phy-X:3/link_reset
 
 and see if it just produces 
 
 aic94xx: scb:0x80 timed out

Yes it does.

 Again?

It is repeatable.

Also when I connect to phy 0 it works (both direct
connect and expander). However phys 1 and 2 react
like phy 3 shown above.

Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: aic94xx driver woes

2007-04-01 Thread Douglas Gilbert

James Bottomley wrote:
 On Sat, 2007-03-31 at 15:05 -0400, Douglas Gilbert wrote:
 James, note the SAS address of the first expander.
 
 Thanks, just checking ... what happens when you directly attach a disk?

Then I get what I term as udev hell. That is when
FC6 gets to the point during boot-up of saying
Starting udev:  and hangs for about 5 minutes and
then continues.

I don't think my log records what happens in that
elongated pause. Later attempts to talk to the
single SAS disk (one port only connected) during
boot-up are shown below starting from the first sign
of trouble. The SAS address of the disk port is
0x5000c50001b02139 .

 Or even try the other expander?

Same as yesterday's report:
  sas: RG to ex 500605b00af0 failed:0xff06


If I fiddle with the cabling long enough (i.e. shorten
it) then it will work some of the time. But how come the
card POST, Luben's driver and Adaptec's for Windows have
no problem with exactly the same wiring all of the
time? I suspect that either the HBA's phys are not
being set up properly or, the first blemish (e.g. loss
of dword synchronization) on the link, knocks the
production driver off its perch, while the other
drivers recover and continue.

Doug Gilbert


...
sas: phy3 added to port0, phy_mask:0x8
sas: DOING DISCOVERY on port 0, pid:2110
aic94xx: scb:0x80 timed out
last message repeated 6 times
sas: command 0xf57d5edc, task 0xf527bea8, timed out: EH_NOT_HANDLED
sas: Enter sas_scsi_recover_host
sas: trying to find task 0xf527bea8
sas: sas_scsi_find_task: aborting task 0xf527bea8
aic94xx: tmf timed out
aic94xx: tmf came back
aic94xx: task not done, clearing nexus
aic94xx: asd_clear_nexus_index: PRE
aic94xx: asd_clear_nexus_index: POST
aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
aic94xx: came back from clear nexus
aic94xx: task not done, clearing nexus
aic94xx: asd_clear_nexus_index: PRE
aic94xx: asd_clear_nexus_index: POST
aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
aic94xx: came back from clear nexus
aic94xx: task 0xf527bea8 aborted, res: 0x5
sas: sas_scsi_find_task: querying task 0xf527bea8
aic94xx: tmf timed out
sas: sas_scsi_find_task: task 0xf527bea8 failed to abort
sas: task 0xf527bea8 is not at LU: I_T recover
sas: I_T nexus reset for dev 5000c50001b02139
sas: clearing nexus for port:0
aic94xx: asd_clear_nexus_port: PRE
aic94xx: asd_clear_nexus_port: POST
aic94xx: asd_clear_nexus_port: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
sas: clear nexus ha
aic94xx: asd_clear_nexus_ha: PRE
aic94xx: asd_clear_nexus_ha: POST
aic94xx: asd_clear_nexus_ha: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
sas: error from  device 5000c50001b02139, LUN 0 couldn't be recovered in any way
sas: --- Exit sas_eh_handle_sas_errors -- clear_q
sas: --- Exit sas_scsi_recover_host
sas: command 0xf57d5edc, task 0xf527bea8, timed out: EH_NOT_HANDLED
sas: Enter sas_scsi_recover_host
sas: trying to find task 0xf527bea8
sas: sas_scsi_find_task: aborting task 0xf527bea8
aic94xx: tmf timed out
aic94xx: tmf came back
aic94xx: task not done, clearing nexus
aic94xx: asd_clear_nexus_index: PRE
aic94xx: asd_clear_nexus_index: POST
aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
aic94xx: came back from clear nexus
aic94xx: task not done, clearing nexus
aic94xx: asd_clear_nexus_index: PRE
aic94xx: asd_clear_nexus_index: POST
aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
aic94xx: came back from clear nexus
aic94xx: task 0xf527bea8 aborted, res: 0x5
sas: sas_scsi_find_task: querying task 0xf527bea8
aic94xx: tmf timed out
sas: sas_scsi_find_task: task 0xf527bea8 failed to abort
sas: task 0xf527bea8 is not at LU: I_T recover
sas: I_T nexus reset for dev 5000c50001b02139
sas: clearing nexus for port:0
aic94xx: asd_clear_nexus_port: PRE
aic94xx: asd_clear_nexus_port: POST
aic94xx: asd_clear_nexus_port: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
sas: clear nexus ha
aic94xx: asd_clear_nexus_ha: PRE
aic94xx: asd_clear_nexus_ha: POST
aic94xx: asd_clear_nexus_ha: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
sas: error from  device 5000c50001b02139, LUN 0 couldn't be recovered in any way
sas: --- Exit sas_eh_handle_sas_errors -- clear_q
sas: --- Exit sas_scsi_recover_host
sas: command 0xf57d5edc, task 0xf527bea8, timed out: EH_NOT_HANDLED
sas: Enter sas_scsi_recover_host
sas: trying to find task 0xf527bea8
sas: sas_scsi_find_task: aborting task 0xf527bea8
aic94xx: tmf timed out
aic94xx: tmf came back
aic94xx: task not done, clearing nexus
aic94xx: asd_clear_nexus_index: PRE
aic94xx: asd_clear_nexus_index: POST
aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
aic94xx: came back from clear nexus
aic94xx: task not done

Re: aic94xx driver woes

2007-03-31 Thread Douglas Gilbert

Darrick J. Wong wrote:
 Douglas Gilbert wrote:
 
 So that is almost 12 months that I have been reporting
 this driver as broken. Is it just me or my hardware?
 
 I seem to recall you saying that the LSI Fusion card was plugged into
 the same expander as the 48300?  If so, does unplugging the Fusion card
 from the expander make it work?

Darrick,
There is a LSI Fusion card in the adjacent PCI-X
slot but it wasn't connected to anything so it
should not have been interfering.

I have another Fusion card in a second machine
which was off. I'll turn the second machine on
now to show the topology of my SAS domain.

Topology (seen from the second machine's MPT Fusion
phy which is both an initiator and a target):

# smp_discover -mb
Device 500605b033ef, expander (only connected phys shown):
  phy   3:S:attached:[500605b6f260:00  i(SSP+STP+SMP) t(SSP)]  3 Gbps
  phy   5:T:attached:[500605b00af0:02 exp t(SMP)]  3 Gbps
  phy   6:T:attached:[5d10002dc000:00  i(SSP+STP+SMP)]  3 Gbps
  phy   9:T:attached:[5000c55208ee:01  t(SSP)]  3 Gbps
  phy  11:T:attached:[5000c50001b0213a:01  t(SSP)]  3 Gbps

# smp_discover -mb -s 0x500605b00af0
Device 500605b00af0, expander (only connected phys shown):
  phy   2:S:attached:[500605b033ef:05 exp t(SMP)]  3 Gbps
  phy  10:T:attached:[5000c50001b02139:00  t(SSP)]  3 Gbps
  phy  11:T:attached:[5000c55208ed:00  t(SSP)]  3 Gbps

James, note the SAS address of the first expander.

So with the second machine off, the expander entry
on 0x500605b033ef phy_id 3 is not there. [The
mainline aic94xx driver fails the same way with the
second machine off or on.]

 aic94xx: Found sequencer Firmware version 1.1 (V17/10c6)
 
 Have you tried the V30 sequencer?

No. But I note that Luben's driver is still using
V17/10c6 successfully (in lk 2.6.21-rc4).

How would I know that the official driver needs firmware,
where to get it and what was the recommended version
with a Kconfig entry like this:

config SCSI_AIC94XX
tristate Adaptec AIC94xx SAS/SATA support
depends on PCI
select SCSI_SAS_LIBSAS
select FW_LOADER
help
This driver supports Adaptec's SAS/SATA 3Gb/s 64 bit PCI-X
AIC94xx chip based host adapters.

config AIC94XX_DEBUG
bool Compile in debug mode
default y
depends on SCSI_AIC94XX
help
Compiles the aic94xx driver in debug mode.  In debug mode,
the driver prints some messages to the console.

??

Is there some useful documentation somewhere else?
If so perhaps I link to it could be placed in the
Kconfig entry.


Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] scsi: Add EH Start Unit retry

2007-03-29 Thread Douglas Gilbert

Brian King wrote:
 Currently, the scsi error handler will issue a START_UNIT
 command if the drive indicates it needs its motor started
 and the allow_restart flag is set in the scsi_device. If,
 after the scsi error handler invokes a host adapter reset
 due to error recovery, a device is in a unit attention
 state AND also needs a START_UNIT, that device will be placed
 offline. The disk array devices on an ipr RAID adapter
 will do exactly this when in a dual initiator configuration.
 This patch adds a single retry to the EH initiated
 START_UNIT.

I have no objection to this patch. Just seems a pity
that SCSI devices go to the trouble of sending unit
attentions while OSes just throw them away.

Perhaps the scsi_device sysfs directory could have entries
like:
  last_ua_asc
  last_ua_ascq
  last_ua_timestamp
where code could place the asc/ascq codes and a timestamp
then continue doing a retry.
Could we get a log entry, hotplug event?

Logical units may queue unit attentions (sam4r10.pdf
section 5.8.7) so it is possible that one retry may
not be enough. With my suggestion above, only the last
one would persist for a reasonable time.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Disabling block layer

2007-03-26 Thread Douglas Gilbert

Mark Lobo wrote:
 Hello!
 
 I had a question about disabling the block layer for SCSI devices. We
 have an embedded device, and it runs 2.4.30. We need to be able to
 support a lot of SCSI devices (in the thousands) for our device, and we
 talk to the devices via SG. We are facing a memory allocation problem
 after discovering a few thousand devices. For every device,  there
 seems to be a lot of memory allocated in the block layer. This memory
 includes cache memory (which IIRC is reclaimable by the kernel memory
 subsystem when it needs it) and also pages that are used for the
 alloc_pages pool.
 
 
 
 My questions were relating to disabling the block layer for the
 devices. We always talk direct passthrough to the storage(except the
 local hard disk),  and do not need the block layer at all. 
 
 1. Is there a way to disable the block layer for specific devices?
 
 2. If yes, how can that be done, and  are there any gotchas associated with 
 that?

Mark,
Tempting thought that: linux without a block layer.
I think you have no hope in the lk 2.4 series and
even less in the lk 2.6 series.

Now for some thoughts. If you don't need to mount any
SCSI disks, you could build a kernel with sd as a
module and remove/hide sd_mod.o . A more invasive method
would be to modify the sd driver so that it was no
longer interested in SCSI devices whose peripheral
device type was zero (i.e. disks).

On the sg driver side, if lots of sg file descriptors
are open to those thousands of SCSI devices, then
reducing the per fd SG_DEF_RESERVED_SIZE from 32 KB
may help. This could be reduced by editing
include/scsi/sg.h .

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RFC: sg driver addition: SG_FLAG_SHARED_MMAP_IO

2007-03-21 Thread Douglas Gilbert

I mentioned this idea a few weeks ago on this list: namely
to allow a sg pass-through request to use the mmap-ed
reserve buffer associated with another sg file descriptor.

In my experience mmap-ed IO using sg's reserve buffer mapped
into the user space is faster than direct IO schemes. However
one shortcoming is that if you try to copy between two devices
using this technique then you end up with two separate mmap-ed
buffers in the user space program. Then the user space program
needs to copy between the two buffers which would defeat much
of the advantage of the mmap-ed IO. You could (and sgm_dd in
sg3_utils does) use mmap-ed IO on the read side and direct IO
on the write side (or vice versa).

I used the sg driver as found in lk 2.6.21-rc4 as a baseline
(and I don't think sg has changed since 2.6.19). A gzipped
diff is attached. There is also some test code (a modified
sgm_dd) in the sg3_utils-1.24 beta on the www.torque.net/sg site.

Here is an example of a disk to disk copy:
  sgm_dd if=/dev/sg0 of=/dev/sg1 oflag=smmap bs=512

The new flag is 'oflag=smmap' which instructs the write SG_IO
on /dev/sg1 to set SG_FLAG_SHARED_MMAP_IO and it passes
the mmap-ed buffer used for /dev/sg0 in dxferp. [Add a
'verbose=1' option and it will indicate how many times shared
mmap IO was requested and how many times it was actually done.]


Features:
  - allow both side of a copy like operation to dma into
and out of the same user space buffer
  - minimal per command overhead (i.e. building of
scatter gather lists and pinning pages)
  - could copy a single source to multiple destinations
efficiently
  - if shared reserve buffer unavailable (or not big
enough) then fall back to indirect IO transparently
  - new info bit SG_INFO_SHARED_MMAP_IO indicates whether
shared mmap-ed IO was done

Restrictions (enforced by the sg driver):
  - confined to file descriptors in the same process
  - there can be only one user of a reserve buffer
at a time
  - low_dma is honoured

Complexity
  - it does have a few more corner cases than usual. For
example in above sgm_dd invocation: closing /dev/sg0
while /dev/sg1 is sharing its mmap-ed reserve buffer ...


Here are some timings copying between two ramdisks. It is
assumed the 'bs=8k' given to dd is equivalent to 'bs=512
bpt=16' given to sgm_dd.

# lsscsi -g
[4:0:0:0]diskLinuxscsi_debug   1.82  /dev/sda  /dev/sg0
[5:0:0:0]diskLinuxscsi_ses 1.06  /dev/sdb  /dev/sg1

# ./dd_tsts.sh
Usage: dd_tsts.sh ifile ofile times bs

# ./dd_tsts.sh /dev/sda /dev/sdb 50 8k
Indirect IO with dd
dd if=/dev/sda of=/dev/sdb bs=8k
real0m7.448s
user0m0.080s
sys 0m7.046s

Direct IO with dd
dd if=/dev/sda iflag=direct of=/dev/sdb oflag=direct bs=8k
real0m4.529s
user0m0.114s
sys 0m3.799s


# ./sg_dd_tsts.sh /dev/sg0 /dev/sg1 50 16
Indirect IO with sg_dd
sg_dd if=/dev/sg0 of=/dev/sg1 bs=512 bpt=16
real0m6.304s
user0m0.171s
sys 0m5.268s

Direct IO with sg_dd
sg_dd if=/dev/sg0 iflag=dio of=/dev/sg1 oflag=dio bs=512 bpt=16
real0m4.246s
user0m0.135s
sys 0m3.395s

Mmap read, indirect IO write with sgm_dd
sgm_dd if=/dev/sg0 of=/dev/sg1 bs=512 bpt=16
real0m4.023s
user0m0.127s
sys 0m3.259s

Mmap read, direct IO write with sgm_dd
sgm_dd if=/dev/sg0 of=/dev/sg1 oflag=dio bs=512 bpt=16
real0m4.057s
user0m0.164s
sys 0m3.264s

Mmap read, shared mmap write with sgm_dd
sgm_dd if=/dev/sg0 of=/dev/sg1 oflag=smmap bs=512 bpt=16
real0m3.871s
user0m0.131s
sys 0m3.111s


Don't expect drastic improvements in real IO unless it is
in the gigabyte per second range.


Doug Gilbert


sg2621rc4smm2.diff.gz
Description: GNU Zip compressed data

Re: [PATCH 2/3] sd: implement START/STOP management

2007-03-20 Thread Douglas Gilbert

Tejun Heo wrote:
 Implement SBC START/STOP management.  sdev-mange_start_stop is added.
 When it's set to one, sd STOPs the device on suspend and shutdown and
 STARTs it on resume.  sdev-manage_start_stop defaults is in sdev
 instead of scsi_disk cdev to allow -slave_config() override the
 default configuration but is exported under scsi_disk sysfs node as
 sdev-allow_restart is.
 
 When manage_start_stop is zero (the default value), this patch doesn't
 introduce any behavior change.
 
 Signed-off-by: Tejun Heo [EMAIL PROTECTED]
 ---
  drivers/scsi/scsi_sysfs.c  |   31 +++--
  drivers/scsi/sd.c  |  102 
 +
  include/scsi/scsi_device.h |1 
  3 files changed, 130 insertions(+), 4 deletions(-)
 
 Index: work/drivers/scsi/sd.c
 ===
 --- work.orig/drivers/scsi/sd.c
 +++ work/drivers/scsi/sd.c
 @@ -142,6 +142,8 @@ static void sd_rw_intr(struct scsi_cmnd 
  static int sd_probe(struct device *);
  static int sd_remove(struct device *);
  static void sd_shutdown(struct device *dev);
 +static int sd_suspend(struct device *dev, pm_message_t state);
 +static int sd_resume(struct device *dev);
  static void sd_rescan(struct device *);
  static int sd_init_command(struct scsi_cmnd *);
  static int sd_issue_flush(struct device *, sector_t *);
 @@ -206,6 +208,20 @@ static ssize_t sd_store_cache_type(struc
   return count;
  }
  
 +static ssize_t sd_store_manage_start_stop(struct class_device *cdev,
 +   const char *buf, size_t count)
 +{
 + struct scsi_disk *sdkp = to_scsi_disk(cdev);
 + struct scsi_device *sdp = sdkp-device;
 +
 + if (!capable(CAP_SYS_ADMIN))
 + return -EACCES;
 +
 + sdp-manage_start_stop = simple_strtoul(buf, NULL, 10);
 +
 + return count;
 +}
 +
  static ssize_t sd_store_allow_restart(struct class_device *cdev, const char 
 *buf,
 size_t count)
  {
 @@ -238,6 +254,14 @@ static ssize_t sd_show_fua(struct class_
   return snprintf(buf, 20, %u\n, sdkp-DPOFUA);
  }
  
 +static ssize_t sd_show_manage_start_stop(struct class_device *cdev, char 
 *buf)
 +{
 + struct scsi_disk *sdkp = to_scsi_disk(cdev);
 + struct scsi_device *sdp = sdkp-device;
 +
 + return snprintf(buf, 20, %u\n, sdp-manage_start_stop);
 +}
 +
  static ssize_t sd_show_allow_restart(struct class_device *cdev, char *buf)
  {
   struct scsi_disk *sdkp = to_scsi_disk(cdev);
 @@ -251,6 +275,8 @@ static struct class_device_attribute sd_
   __ATTR(FUA, S_IRUGO, sd_show_fua, NULL),
   __ATTR(allow_restart, S_IRUGO|S_IWUSR, sd_show_allow_restart,
  sd_store_allow_restart),
 + __ATTR(manage_start_stop, S_IRUGO|S_IWUSR, sd_show_manage_start_stop,
 +sd_store_manage_start_stop),
   __ATTR_NULL,
  };
  
 @@ -267,6 +293,8 @@ static struct scsi_driver sd_template = 
   .name   = sd,
   .probe  = sd_probe,
   .remove = sd_remove,
 + .suspend= sd_suspend,
 + .resume = sd_resume,
   .shutdown   = sd_shutdown,
   },
   .rescan = sd_rescan,
 @@ -1776,6 +1804,32 @@ static void scsi_disk_release(struct cla
   kfree(sdkp);
  }
  
 +static int sd_start_stop_device(struct scsi_device *sdp, int start)
 +{
 + unsigned char cmd[6] = { START_STOP };  /* START_VALID */
 + struct scsi_sense_hdr sshdr;
 + int res;
 +
 + if (start)
 + cmd[4] |= 1;/* START */
 +
 + if (!scsi_device_online(sdp))
 + return -ENODEV;
 +
 + res = scsi_execute_req(sdp, cmd, DMA_NONE, NULL, 0, sshdr,
 +SD_TIMEOUT, SD_MAX_RETRIES);

Tejun,
I note at this point that the IMMED bit in the
START STOP UNIT cdb is clear. [The code might
note that as well.] All SCSI disks that I have
seen, implement the IMMED bit and according to
the SAT standard, so should SAT layers like the
one in libata.

With the IMMED bit clear:
  - on spin up, it will wait until disk is ready.
Okay unless there are a lot of disks, in
which case we could ask Matthew Wilcox for help
  - on spin down, will wait until media is
stopped. That could be 20 seconds, and if there
were multiple disks 

I guess the question is do we need to wait until a
disk is spun down before dropping power to it
and suspending.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bsg: iovec support

2007-03-19 Thread Douglas Gilbert

FUJITA Tomonori wrote:
 From: Pete Wyckoff [EMAIL PROTECTED]
 Subject: [PATCH] bsg: iovec support
 Date: Thu, 1 Mar 2007 17:29:08 -0500

 Support vectored IO as in SGv3.  The iovec structure uses explicit
 sizes to avoid the need for compat conversion.

 Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
 ---

 My application definitely can take advantage of scatter/gather IO,
 which is supported in sgv3 but not in the bsg implementation of sgv4.
 I understand Tomo's concerns about code bloat and the need for
 32/64 compat translations, but this will make things much easier on
 users of bsg who read or write out of multiple buffers in a single
 SCSI operation.

 (snip)

 + * Vector of address/length pairs, used when dout_iovec_count (or din_)
 + * is non-zero.  In that case, dout_xferp is a list of struct sg_io_v4_vec
 + * and dout_iovec_count is the number of entries in that list.  
 dout_xfer_len
 + * is the total length of the list.  Note the use of u64 instead of a
 + * native pointer to avoid compat issues, and padding to avoid structure
 + * alignment problems.
 + */
 +struct sg_io_v4_vec {
 +__u64 iov_base;
 +__u32 iov_len;
 +__u32 __pad1;
 +};

 I don't think that it's a good idea to add a new scatter/gather
 structure and export it to user space.

User space scatter gather is not a new feature.
It is defined and works in sg v3.

It was also partially defined in sg v4 and dropped
out in the bsg implementation. I agree with Pete that
it should be put back.

Pete is also suggesting (shown above) a revised sg_io_vec
structure that uses a uint64_t for the pointer to simplify
32, 64 bit thunking.

 bsg can support scatter/gather IO with ioctl (SG_IO) easily (I mean,
 without adding ugly compat code to bsg.c). I guess that SG_IO doesn't
 work for you because it works synchronously. However, all system calls
 might work asynchronously in the future.

User space scatter gather is completely decoupled from
in-kernel scatter gather lists built for HBA DMA
engines. Same technique but at different levels.

Someone might user space scatter gather to efficiently
fetch several OSD objects implemented in a block device
as adjacent blocks

Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bsg: iovec support

2007-03-19 Thread Douglas Gilbert

FUJITA Tomonori wrote:
 From: Douglas Gilbert [EMAIL PROTECTED]
 Subject: Re: [PATCH] bsg: iovec support
 Date: Mon, 19 Mar 2007 08:56:39 -0400

 FUJITA Tomonori wrote:
 From: Pete Wyckoff [EMAIL PROTECTED]
 Subject: [PATCH] bsg: iovec support
 Date: Thu, 1 Mar 2007 17:29:08 -0500

 Support vectored IO as in SGv3.  The iovec structure uses explicit
 sizes to avoid the need for compat conversion.

 Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
 ---

 My application definitely can take advantage of scatter/gather IO,
 which is supported in sgv3 but not in the bsg implementation of sgv4.
 I understand Tomo's concerns about code bloat and the need for
 32/64 compat translations, but this will make things much easier on
 users of bsg who read or write out of multiple buffers in a single
 SCSI operation.
 (snip)

 + * Vector of address/length pairs, used when dout_iovec_count (or din_)
 + * is non-zero.  In that case, dout_xferp is a list of struct sg_io_v4_vec
 + * and dout_iovec_count is the number of entries in that list.  
 dout_xfer_len
 + * is the total length of the list.  Note the use of u64 instead of a
 + * native pointer to avoid compat issues, and padding to avoid structure
 + * alignment problems.
 + */
 +struct sg_io_v4_vec {
 +  __u64 iov_base;
 +  __u32 iov_len;
 +  __u32 __pad1;
 +};
 I don't think that it's a good idea to add a new scatter/gather
 structure and export it to user space.
 User space scatter gather is not a new feature.
 It is defined and works in sg v3.

 It was also partially defined in sg v4 and dropped
 out in the bsg implementation. I agree with Pete that
 it should be put back.

 I'm fine with supporting iovec (though I don't like it).

Tomo,
You don't need to support it if you don't want to.
So if din_iovec_count or dout_iovec_count are other
than zero, bsg can return an error.

By dropping those fields, other implementations are
precluded from supporting that feature.

 Pete is also suggesting (shown above) a revised sg_io_vec
 structure that uses a uint64_t for the pointer to simplify
 32, 64 bit thunking.

 All I said is that it would be better to use the existing compat
 infrastructure (sg_build_iovec, sg_ioctl_trans, etc in
 fs/compat_ioctl.c) instead of adding another compat code.

Won't sg v4 make this even a bigger mess, at least
initially anyway?

Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] fusion - honour return value of pci_enable_device() in mpt_resume()

2007-03-16 Thread Douglas Gilbert

Randy Dunlap wrote:
 On Fri, 16 Mar 2007 11:14:51 -0500 James Bottomley wrote:
 
 On Fri, 2007-03-16 at 08:06 -0700, Randy Dunlap wrote:
 On Fri, 16 Mar 2007 09:27:26 -0500 James Bottomley wrote:

 On Fri, 2007-03-16 at 16:05 +0900, Horms wrote:
 +   err = pci_enable_device(pdev);
 +   if (err  0)
 +   return err;
 Traditionally, this should be 

 if (err)
return err;

 The reason is that 0 is a signed comparison which can be slightly more
 expensive on some architectures and it's unnecessary if zero is the only
 successful return.
 Tradition vs. Linus, eh?  Linus wrote (2007-Mar-06, on lkml,
 Message-ID: [EMAIL PROTECTED]):
 Sure ... we can all maintain our own traditions .. what was the subject
 of this email?
 
 The subject was coding style and return/error codes.
 The Subject: line was: Re: [5/6] 2.6.21-rc2: known regressions

Randy,
While on the subject of traditions, how about the
C90 and C99 ones?

C identifiers starting with __ are reserved!
Reference: ISO/IEC 9899:1999 (C99) section 7.1.3 All
identifiers that start with an underscore and either
an upper case letter or another underscore are always
reserved for any use. It was the same in C90.

Now we might start getting rid of __u32 and its
friends first :-)

Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

SCSI Generic version 4 interface, release 1.2

2007-03-14 Thread Douglas Gilbert

After reviewing this post by Pete Wyckoff:
http://marc.theaimsgroup.com/?l=linux-scsim=117278879816029w=2

I decided to update my sg v4 interface document originally
posted 20061106 which I will now call release 1.1 :
http://lwn.net/Articles/208082/

Pete was proposing to put back din_iovec_count and
dout_iovec_count that had been dropped out of bsg but
had been in release 1.1 . Hmm.

Some other items have been picked up from the bsg
implementation plus the suggestion from LSF'07 to
add dout_resid.

See the attachment, comments welcome.

Doug Gilbert




  SCSI Generic version 4 interface structure
  ==
  Release 1.2

Goals:
  - handle both generalized request/response and data_out/data_in
independently in same invocation (i.e. synchronous usage).
  - alternatively the request and data_out could be instigated in one
invocation with pointers given for the incoming response and data_in.
Then a second invocation (as a result of polling or asynchronous
notification) reports the response and/or data_in is done, plus
provides error/resid/timing information. This is asynchronous usage.
This allows for the most complicated SCSI commands: tagged, variable
length cdbs with bidirectional data transfers.
  - support multiple protocols. If they are generalized request-response
protocols then they can choose either the request/response part of the
interface or the data_out/data_in part.
  - layered error/condition reporting: (OS) driver, transport and device
(logical unit). Method used to present this struct to OS (e.g. ioctl())
may also report error (e.g. EPERM).
  - allow for auxiliary information to be passed back for the application
client to consider
  - same structure can be used for a synchronous (e.g. interruptible ioctl)
or asynchronous (e.g. ioctl()/read() ) pass through.
  - leave device (lu) or target addressing issues to some other mechanism
(what SCSI standards call the I_T_L or the I_T nexus respectively) as
they are transport dependent. However do include the tag level (the
_Q part of a I_T_L_Q nexus).
  - stay close enough to struct sg_io_hdr (sg version 3 interface) to use
with existing SG_IO ioctls, current implementations expect 'S' in
'guard'


Comments:
  - unsigned 64 bit integers used as pointer carriers to ease 32/64
bit code interworking (e.g. 32 bit app on 64 bit kernel)
  - should there be more (or less) spare fields?
  - the write() usage in the sg driver's asynchronous interface has
caused problems when mistakenly applied to a block device node
rather than a sg device node. Using an ioctl(flag_async) followed
by a read() for asynchronous work offers similar functionality and
is safer. Using ioctl(flag_async_start) and ioctl(flag_async_finish)
is another possibility.
  - rather than have a separate ATA pass through mechanism, the SAT
defined ATA PASS THROUGH SCSI commands could be used with the
driver implementation routing the ATA commands to their
subsystem. This could be flagged so it didn't preclude a SAT layer
in a SCSI transport (e.g. MPT SAS HBA firmware).
  - if SAM/SPC does not define an enumeration for lesser used input
fields, then use the value 0 for inert/off/don't_care .
  - the SCSI command tag field as currently defined in SAM-4 can be
up to 64 bits (with a proposal to increase that to 96 bits for FCP)
Should we let the transport layer/LLD worry about that?


ChangeLog for release 1.2 [20070314]
  - add dout_resid
  - re-arrange uint64_t types (i.e. pointer carriers) to be on a
8 byte boundary
  - reinstate dout_iovec_count and din_iovec_count (they were in
release 1.1 but bsg dropped them)
  - change name: response_len_wr to response_len
  - pick up some descriptions from bsg

ChangeLog for release 1.1 [20061106]
  - was called sg version 4 interface, version 1.1
so change the second version to release


---
#include stdint.h


struct sg_io_v4
{
int32_t guard;  /* [i] 'Q' to differentiate from v3 */
uint32_t protocol;  /* [i] 0 - SCSI ,  */
uint32_t subprotocol;   /* [i] 0 - SCSI command, 1 - SCSI task
   management function,  */

uint32_t request_len;   /* [i] in bytes {SCSI: cdb length} */
uint64_t request;   /* [i], [*i] {SCSI: cdb} */
uint32_t request_attr;  /* [i] {SCSI: task attribute} */
uint32_t request_tag;   /* [i] {SCSI: task tag (only if flagged)} */
uint32_t request_priority; /* [i] {SCSI: task priority} */
uint32_t max_response_len; /* [i] in bytes */
uint64_t response;  /* [i], [*o]  {SCSI: (auto)sense data} */

/* dout_: data out (to device); din_: data in (from device) */

Re: How to send inquiry command to thorugh sd path (i.e. /dev/sda) by using SG_IO ioctl

2007-03-12 Thread Douglas Gilbert

MasthanUsha wrote:
 
 Hi All,
  
 Any one og you have any idea on scsi inquiry command ?
  
 I want to send an Inquiry command to a scsi device through sd path (.i.e.
 /dev/sda or /dev/sdb) by using SG_IO ioctl. Please explain me...

If you look at http://www.torque.net/sg/sg3_utils.html
and fetch a tarball (e.g. sg3_utils-1.23.tgz) then have
a look at the examples/sg_simple1.c file.

Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to send inquiry command to thorugh sd path (i.e. /dev/sda) by using SG_IO ioctl

2007-03-12 Thread Douglas Gilbert

dudekula mastan wrote:
 Hi Gilbert,

   Thanks for quick reply.

   The example program (sg_Simple --- not only this all examples) is taking 
 /dev/sg  path as input  but I want /dev/sd path as input.

In the lk 2.6 series, it will also work for sd devices
(and hd devices if they happen to be cd/dvd drives).

   Please explain me with an example, which takes /dev/sd path as input.

You have one already. Actually you have lots of examples there.

Is SG_IO supports sd driver ? 

yes, in the lk 2.6 series.

 I am not sure.. I think it will work only for sg driver. Am I correct ?.

That was correct several years ago, not now.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO FS stack

2007-03-12 Thread Douglas Gilbert

Bryan Henderson wrote:
 DOS partitions start partitions on odd-numbered sectors
 
 I don't get this.  If you mean partitions defined by the classic DOS 
 partition table format, then AFAICS, such a partition can start in any 
 sector.

Bryan,
Typically the first partition on a DOS partitioned disk
starts at the next available sector after the mbr
which, for some bizarre reason, is 63 sectors long.
Hence:

# fdisk -lu /dev/hda

Disk /dev/hda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders, total 156301488 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot  Start End  Blocks   Id  System
/dev/hda1   *  6318314099 9157018+   c  W95 FAT32 (LBA)
/dev/hda21831410019551104  618502+  82  Linux swap / Solaris
/dev/hda419551105   15629638468372640   83  Linux


 
 so presuming you have odd-aligned disks, life is good.
 
 What is an odd-aligned disk?

s/disk/partition/ ?
Perhaps hda1 and hda4 above are examples.

Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Transport ID in Persistent Reservation

2007-03-08 Thread Douglas Gilbert

renuka apte wrote:
 The 'Specify Initiator Ports' support in persistent reservation allows
 the application client to send a bunch of transport IDs which identify
 initiator ports. I would like to know the suggested format for these
 transport IDs.

http://www.t10.org/ftp/t10/drafts/spc4/spc4r09.pdf
section 7.5.4  TransportID identifiers

 I am assuming that it must have something to do with the WWN of the
 initiator ports.

and that is transport dependent.

 I tried using sg_utils to issue a persistent reservation IN command
 with READ FULL STATUS to a virtual SCSI disk which is returning the
 response in the format specified in SPC-4. However the sg_utils seems
 to be decoding the transport ID in some way that I cant find in the
 standard.

Have another look :-)

Doug Gilbert



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug or typo in scsi_debug.c

2007-03-06 Thread Douglas Gilbert

Mark Harvey wrote:
 Looking thru this driver, I saw what looks to be a bug/typo if an error
 occurs when
 calling driver_register() from scsi_debug_init()
 
 Cheers
 Mark
 
 --- scsi_debug-orig.c   2007-03-03 19:38:23.0 +1100
 +++ scsi_debug.c2007-03-03 19:39:51.0 +1100
 @@ -2841,7 +2841,7 @@
 if (ret  0) {
 printk(KERN_WARNING scsi_debug: driver_register error:
 %d\n,
 ret);
 -   goto bus_unreg;
 +   goto driver_unreg;
 }
 ret = do_create_driverfs_files();
 if (ret  0) {
 @@ -2873,6 +2873,7 @@
 
  del_files:
 do_remove_driverfs_files();
 +driver_unreg:
 driver_unregister(sdebug_driverfs_driver);
  bus_unreg:
 bus_unregister(pseudo_lld_bus);


Mark,
Um, I know my name is on that driver (with Eric's)
but I didn't write the code in that function.

I don't understand why your patch wants to call
driver_unregister() after driver_register() has failed.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: improve sg_luns output for iscsi

2007-03-06 Thread Douglas Gilbert

Olaf Hering wrote:
 Upcoming IBM pSeries firmware can boot from iscsi. To configure the
 openfirmware boot-device string, we need to construct a correct
 devicepath. This path includes the lun. Its currently not 100% clear
 how exactly this lun value has to look like.
 
 sg_luns may be the tool to get the value. But its current output is not
 parseable by scripts. It even gives the same output for two different
 scsi devices:
 
 girgendwas:~ # lsscsi
 [0:0:0:0]diskDGC  RAID 5   0219  /dev/sda
 [0:0:0:1]diskDGC  RAID 5   0219  /dev/sdb
 [0:0:0:2]diskDGC  RAID 5   0219  /dev/sdc
 [0:0:0:3]diskDGC  RAID 5   0219  /dev/sdd
 girgendwas:~ # sg_luns -V
 sg_luns: version: 1.05 20060127
 girgendwas:~ # sg_luns /dev/sdd
 Lun list length = 32 which imples 4 lun entries
 Report luns [select_report=0]:
 
 0001
 0002
 0003
 girgendwas:~ # sg_luns /dev/sdc
 Lun list length = 32 which imples 4 lun entries
 Report luns [select_report=0]:
 
 0001
 0002
 0003
 
 Is it possible to print the lun only for the requested scsi device?

Olaf,
sg_luns is an application client driving the SCSI
REPORT LUNS command. It is a trick SCSI command
since even though it addresses a logical unit, it
is really the target that replies (as it is the target
that knows about the sibling logical units) ***.
The REPORT LUNS response gives no indication which
(if any) 64 bit lun was addressed.

Now I would not want to break the link between sg_luns
and the SCSI REPORT LUNS command. Adding an extra
parameter to try and find the lun associated with the
file descriptor has a few problems (from my point
of view):
   - it would be OS specific (sg_luns isn't currently)
   - within Linux there are different mechanisms in
 the 2.4 and 2.6 series kernels.

In your example above a combination of lsscsi and sg_luns
gives the answer (0003) but lsscsi is
linux 2.6 series specific. sg_scan would probably work
as a replacement for lsscsi (and sg_scan also works in the
lk 2.4 series (and Windows)).


To address the parsability of sg_luns output, I recently
added a '--quiet' option to suppress the extraneous
output.


In summary sg_luns is probably not what you want!
What about the lu _name_? For iSCSI the lu should
yield a world wide unique SCSI name designator
in the device identification VPD page (see SPC-4
and SAM-4 Annex A; the iSCSI standard woffles in
this area).


*** a better way to get a target to report its active
luns is to use the REPORT LUNS well known logical
unit but hardly anyone implements that.

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] tgt: fix scsi command leak

2007-03-05 Thread Douglas Gilbert

FUJITA Tomonori wrote:
 From: Douglas Gilbert [EMAIL PROTECTED]
 Subject: Re: [PATCH 3/3] tgt: fix scsi command leak
 Date: Sat, 03 Mar 2007 11:58:19 -0500

 FUJITA Tomonori wrote:
 The failure to map user-space pages leads to scsi command leak. It can
 happens mostly because of user-space daemon bugs (or OOM). This patch
 makes tgt just notify a LLD of the failure with sense when
 blk_rq_map_user() fails.

 Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED]
 Signed-off-by: Mike Christie [EMAIL PROTECTED]
 ---
  drivers/scsi/scsi_tgt_lib.c |   23 ---
  1 files changed, 20 insertions(+), 3 deletions(-)

 diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
 index dc8781a..c05dff9 100644
 --- a/drivers/scsi/scsi_tgt_lib.c
 +++ b/drivers/scsi/scsi_tgt_lib.c
 @@ -459,6 +459,16 @@ static struct request *tgt_cmd_hash_look
 return rq;
  }

 +static void scsi_tgt_build_sense(unsigned char *sense_buffer, unsigned 
 char key,
 +unsigned char asc, unsigned char asq)
 +{
 +   sense_buffer[0] = 0x70;
 +   sense_buffer[2] = key;
 +   sense_buffer[7] = 0xa;
 +   sense_buffer[12] = asc;
 +   sense_buffer[13] = asq;
 +}
 +
 Tomo,
 Perhaps you could add a memset(sense_buffer, 0, 18) before
 those assignments and state that this is fixed sense
 buffer format.

 I think that it isn't necessary because when a target mode driver
 allocates scsi_cmnd, scsi_host_get_command() does that.

 What about an option for descriptor sense format? With SAT now
 a standard, we now have one more reason to support
 descriptor format when required. The ATA PASS-THROUGH SCSI
 commands in SAT use descriptor sense format to return
 ATA registers.

 tgt's kernel-space code doesn't know anything about SCSI devices that
 initiators talks to. So it's difficult to send proper sense buffer.
 Nomally, we don't have this problem because tgt user-space code builds
 sense buffer.

 The bug that we are trying to fix is that the scsi command leak due to
 the user-space's bugs. So we can't rely on the user-space for this.

 Not that, like open-iscsi, the user-space bugs are pretty critical for
 tgt as the kernel-space bugs. We don't think target mode drivers can
 continue to work. However, tgt should tell target mode drivers that
 unrecoverable problems happen and we should cleanly unload the kernel
 modules.

Tomo,
If I understand correctly, there is a target SCSI command
interpreter in a user space daemon (plus lu support) and the
target transport end point in kernel space (roughly speaking).
So if there is some problem in the kernel module, or
the user space daemon goes away (or won't respond) then what
you have is a transport error at the target end.

The error should be lower level than SCSI commands (i.e.
transport level). The kernel module doesn't know the state
of target SCSI command interpreter (by design). For example
the application client may have set the D_SENSE bit in the
control mode page prior to the failure that your code is
addressing. So the application client won't be expecting
fixed sense data format thereafter.

So what the code is doing is definitely better than nothing,
but IMO it isn't quite right either.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: convert sg to block layer helpers - v5

2007-03-04 Thread Douglas Gilbert

[EMAIL PROTECTED] wrote:
 There is no big changes between v4 and v5. I was able to fix
 things in scsi tgt, so I could remove the weird arguements
 the block helpers were taking for it. I also tried to break
 up the patchset for easier viewing. The final patch also
 takes care of the access_ok regression.
 
 These patches were made against linus's tree since Tomo needed
 me to break part of it out for his scsi tgt bug fix patches.
 
 0001-rm-bio-hacks-in-scsi-tgt.txt - Drop scsi tgt's bio_map_user
 usage and convert it to blk_rq_map_user. Tomo is also sending
 this patch in his patchset since he needs it for his bug fixes.
 
 0002-rm-block-device-arg-from-bio-map-user.txt - The block_device
 argument is never used in the bio map user functions, so this
 patch drops it.
 
 0003-Support-large-sg-io-segments.txt - Modify the bio functions
 to allocate multiple pages at once instead of a single page.
 
 0004-Add-reserve-buffer-for-sg-io.txt - Add reserve buffer support
 to the block layer for sg and st indirect IO use.
 
 0005-Add-sg-io-mmap-helper.txt - Add some block layer helpers for
 sg mmap support.
 
 0006-Convert-sg-to-block-layer-helpers.txt - Convert sg to block
 layer helpers.
 
 0007-mv-user-buffer-copy-access_ok-test-to-block-helper.txt - 
 Move user data buffer access_ok tests to block layer helpers.
 
 The goal of this patchset is to remove scsi_execute_async and
 reduce code duplication.
 
 People want to discuss further merging sg and bsg/scsi_ioctl
 functionality, but I did not handle and any of that in this
 patchset since people still disagree on what should supported
 with future interfaces.
 
 My only TODO is maybe make the bio reserve buffer mempoolable
 (make it work as mempool alloc and free functions). Since
 sg only supported one reserve buffer per fd I have not worked
 on it and it did not seem worth it if there are no users.
***

Mike,
I see you are removing the scatter_elem_sz parameter.
What decides the scatter gather element size? Can it
be greater than PAGE_SIZE?


*** Generalizing the idea of a mmap-ed reserve buffer to
something the user had more control over could be very
powerful.
For example allowing two file descriptors (to different
devices) in the same process to share the same mmap-ed
area. This would allow a device to device copy to DMA into
and out of the same memory, potentially with large per command
transfers and with no per command scatter gather build and
tear down. Basically a zero copy copy with minimal CPU
overhead.

Doug Gilbert



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: convert sg to block layer helpers - v5

2007-03-04 Thread Douglas Gilbert

Mike Christie wrote:
 Douglas Gilbert wrote:
 Mike,
 I see you are removing the scatter_elem_sz parameter.
 What decides the scatter gather element size? Can it
 be greater than PAGE_SIZE?
 
 Oh yeah, sorry I should have documented that.
 
 I just made the code try to allocate as large a element as possible.
 So the code looks at q-max_segment_size and tries to allocate segments
 that large initially. If that is too large then it will drop down by
 half like what sg.c used to do when it could not allocate large segments.
 
 I will add the param back if you want. I had thought it was a workaound
 due to the segment size of a device not being exported.
 

 *** Generalizing the idea of a mmap-ed reserve buffer to
 something the user had more control over could be very
 powerful.
 For example allowing two file descriptors (to different
 devices) in the same process to share the same mmap-ed
 area. This would allow a device to device copy to DMA into
 and out of the same memory, potentially with large per command
 transfers and with no per command scatter gather build and
 tear down. Basically a zero copy copy with minimal CPU
 overhead.

 
 I was thinking of something similar but not based on mmap. I have been
 trying to figure out a way to do sg io splice. I do not care what
 interface or method is used, I think it would be useful.
 
 I know we talked about the mmap approach a little, but I do not remember
 if we talked about how to tell both fds that they are going to use the
 same buffer. Would we need a modification to the sg header or would we
 need to add in a new IOCTL which would tell sg.c to share the buffer
 between two fds?

Mike,
Currently there is a flag in sgv3:
#define SG_FLAG_MMAP_IO 4
and when it is active the dxferp field is ignored
as it is assumed the user previously did a mmap()
call to get the reserved buffer.

We could add a:
#define SG_FLAG_MMAP_IO_SHARED 8
and then the pointer in dxferp could taken as
the already mmap-ed buffer from another device.


Having more than one mmap-ed IO buffer per file
descriptor would be nice but opening multiple
file descriptors to the same device can give
the same effect (with perhaps a POSIX thread per
file descriptor).


Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] SCSI: Printing cleanups

2007-02-27 Thread Douglas Gilbert

Martin K. Petersen wrote:
 This patch series is the first batch of cleanups in an attempt to make
 the SCSI printing more consistent and suitable for human consumption.
 
 Previously a typical error looked like this:
 
 sd 0:0:0:0: SCSI error: return code = 0x0802
 sda: Current: sense key: Aborted Command
 Additional sense: Logical block reference tag check failed
 
 You had to have the magic return value decoder ring handy to figure
 out what had really happened.  And you had to do the mapping between
 sd 0:0:0:0 and sda yourself.
 
 
 The following patches clean up various bits so that the same
 information can be presented in a more readable form:
 
 sd 0:0:0:0: [sda] Result: hostbyte=DID_OK 
 driverbyte=DRIVER_SENSE,SUGGEST_OK
 sd 0:0:0:0: [sda] Sense Key : Aborted Command [current] 
 sd 0:0:0:0: [sda] Add. Sense: Logical block reference tag check failed
 
 All printk's from sd.c now have the same prefix.  If logging is turned
 on, for instance, we also get:
 
 sd 0:0:0:0: [sda] Send: 0x0fb89180 
 sd 0:0:0:0: [sda] CDB: Read(16): 88 20 00 00 00 00 00 00 00 20 00 00 00 
 08 00 00
 sd 0:0:0:0: [sda] Done: 0x0fb89180 SUCCESS
 
 The patches need to be applied in order.

Martin,
Looks good.

If you need to revise anything, perhaps you could add a
comment with this url near the list of additional sense
codes:
  http://www.t10.org/lists/asc-num.txt

That is the official list of SCSI additional sense codes.

Based on the date of my last additional sense code update
only this one is missing:
  2Fh/02h  DTLPWROMAEBKVF  COMMANDS CLEARED BY DEVICE SERVER


Doug Gilbert




-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: end to end error recovery musings

2007-02-25 Thread Douglas Gilbert

H. Peter Anvin wrote:
 Ric Wheeler wrote:

 We still have the following challenges:

(1) read-ahead often means that we will  retry every bad sector at
 least twice from the file system level. The first time, the fs read
 ahead request triggers a speculative read that includes the bad sector
 (triggering the error handling mechanisms) right before the real
 application triggers a read does the same thing.  Not sure what the
 answer is here since read-ahead is obviously a huge win in the normal
 case.

 
 Probably the only sane thing to do is to remember the bad sectors and
 avoid attempting reading them; that would mean marking automatic
 versus explicitly requested requests to determine whether or not to
 filter them against a list of discovered bad blocks.

Some disks are doing their own read-ahead in the form
of a background media scan. Scans are done on request or
periodically (e.g. once per day or once per week) and we
have tools that can fetch the scan results from a disk
(e.g. a list of unreadable sectors). What we don't have
is any way to feed such information to a file system
that may be impacted.

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bsg: return SAM device status code

2007-02-23 Thread Douglas Gilbert

Pete Wyckoff wrote:
 Use the status codes from the standard, not the shifted-by-one codes
 that are marked deprecated in scsi.h.  This makes bsg v4 status
 report the same value as sg v3 status too.

Pete,
Good pick up. We certainly don't want to re-introduce
the SCSI status byte shift from the old days.

Doug Gilbert


 Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
 ---
  block/bsg.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/block/bsg.c b/block/bsg.c
 index c85d961..e39a321 100644
 --- a/block/bsg.c
 +++ b/block/bsg.c
 @@ -438,7 +438,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
 struct sg_io_v4 *hdr,
   /*
* fill in all the output members
*/
 - hdr-device_status = status_byte(rq-errors);
 + hdr-device_status = rq-errors  0xff;
   hdr-transport_status = host_byte(rq-errors);
   hdr-driver_status = driver_byte(rq-errors);
   hdr-info = 0;

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: program inquiry is using a deprecated scsi_ioctl , please convert it to SG_IO

2007-02-22 Thread Douglas Gilbert

James Bottomley wrote:
 On Thu, 2007-02-22 at 11:59 +0530, MASTHAN DUDEKULA wrote:
 Hi JAMES,
  
  
 The following code is SG_IO equivalent of scsi ioctls
 SCSI_TEST_UNIT_READY
  
unsigned char sense_b[32];
 unsigned char turCmbBlk[] = {0x00, 0, 0, 0, 0, 0};
 struct sg_io_hdr io_hdr; 

 memset(io_hdr, 0, sizeof(struct sg_io_hdr));
 io_hdr.interface_id = 'S';
 io_hdr.cmd_len = sizeof(turCmbBlk);
 io_hdr.mx_sb_len = sizeof(sense_b);
 io_hdr.dxfer_direction = SG_DXFER_NONE;
 io_hdr.cmdp = turCmbBlk;
 io_hdr.sbp = sense_b;
 io_hdr.timeout = DEF_TIMEOUT;

 if (ioctl(fd, SG_IO, io_hdr)  0) {
  
 Like this What is the SG_IO equivalent for SCSI_IOCTL_SCSI_COMMAND ?

Judging from the above you have found some sg3_utils
code. In a recent version, if you go to the examples
subdirectory, you will find the scsi_inquiry.c and
sg_simple1.c files.
The former shows the usage of the older, deprecated
SCSI_IOCTL_SCSI_COMMAND ioctl while the latter does something
very similar but uses the SG_IO ioctl interface.

The equivalence is that they both programs send a SCSI
INQUIRY cdb to a device and print out the response.

Doug Gilbert

 I don't understand your question ... SCSI_IOCTL_SEND_COMMAND sends a
 SCSI command to the device.  Your example of test unit ready above does
 just that ... it sends a Test Unit Ready command to the device using
 SG_IO ... exactly what do you not understand about using SG_IO to send
 commands to the device?
 
 James

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bug 7994] New: sleeping function called from invalid context at mm/slab.c:3034

2007-02-17 Thread Douglas Gilbert

Andrew Morton wrote:
 On Fri, 16 Feb 2007 22:59:31 -0500 Douglas Gilbert [EMAIL PROTECTED] wrote:
 
 The patch that I sent, shown at the end of this post,
 is incomplete as it doesn't check the return value
 from kzalloc(..., GFP_ATOMIC).
 
 The diff which is in mainline now looks to be OK?

Yes.

Doug Gilbert


 --- linux-2.6.20/drivers/scsi/scsi_debug.c2006-11-29 19:14:18.0 
 -0800
 +++ devel/drivers/scsi/scsi_debug.c   2007-02-16 21:21:08.0 -0800
 @@ -28,7 +28,6 @@
  #include linux/module.h
  
  #include linux/kernel.h
 -#include linux/sched.h
  #include linux/errno.h
  #include linux/timer.h
  #include linux/types.h
 @@ -51,10 +50,10 @@
  #include scsi_logging.h
  #include scsi_debug.h
  
 -#define SCSI_DEBUG_VERSION 1.80
 -static const char * scsi_debug_version_date = 20061018;
 +#define SCSI_DEBUG_VERSION 1.81
 +static const char * scsi_debug_version_date = 20070104;
  
 -/* Additional Sense Code (ASC) used */
 +/* Additional Sense Code (ASC) */
  #define NO_ADDITIONAL_SENSE 0x0
  #define LOGICAL_UNIT_NOT_READY 0x4
  #define UNRECOVERED_READ_ERR 0x11
 @@ -65,9 +64,13 @@ static const char * scsi_debug_version_d
  #define INVALID_FIELD_IN_PARAM_LIST 0x26
  #define POWERON_RESET 0x29
  #define SAVING_PARAMS_UNSUP 0x39
 +#define TRANSPORT_PROBLEM 0x4b
  #define THRESHOLD_EXCEEDED 0x5d
  #define LOW_POWER_COND_ON 0x5e
  
 +/* Additional Sense Code Qualifier (ASCQ) */
 +#define ACK_NAK_TO 0x3
 +
  #define SDEBUG_TAGGED_QUEUING 0 /* 0 | MSG_SIMPLE_TAG | MSG_ORDERED_TAG */
  
  /* Default values for driver parameters */
 @@ -95,15 +98,20 @@ static const char * scsi_debug_version_d
  #define SCSI_DEBUG_OPT_MEDIUM_ERR   2
  #define SCSI_DEBUG_OPT_TIMEOUT   4
  #define SCSI_DEBUG_OPT_RECOVERED_ERR   8
 +#define SCSI_DEBUG_OPT_TRANSPORT_ERR   16
  /* When every_nth  0 then modulo every_nth commands:
   *   - a no response is simulated if SCSI_DEBUG_OPT_TIMEOUT is set
   *   - a RECOVERED_ERROR is simulated on successful read and write
   * commands if SCSI_DEBUG_OPT_RECOVERED_ERR is set.
 + *   - a TRANSPORT_ERROR is simulated on successful read and write
 + * commands if SCSI_DEBUG_OPT_TRANSPORT_ERR is set.
   *
   * When every_nth  0 then after - every_nth commands:
   *   - a no response is simulated if SCSI_DEBUG_OPT_TIMEOUT is set
   *   - a RECOVERED_ERROR is simulated on successful read and write
   * commands if SCSI_DEBUG_OPT_RECOVERED_ERR is set.
 + *   - a TRANSPORT_ERROR is simulated on successful read and write
 + * commands if SCSI_DEBUG_OPT_TRANSPORT_ERR is set.
   * This will continue until some other action occurs (e.g. the user
   * writing a new value (other than -1 or 1) to every_nth via sysfs).
   */
 @@ -315,6 +323,7 @@ int scsi_debug_queuecommand(struct scsi_
   int target = SCpnt-device-id;
   struct sdebug_dev_info * devip = NULL;
   int inj_recovered = 0;
 + int inj_transport = 0;
   int delay_override = 0;
  
   if (done == NULL)
 @@ -352,6 +361,8 @@ int scsi_debug_queuecommand(struct scsi_
   return 0; /* ignore command causing timeout */
   else if (SCSI_DEBUG_OPT_RECOVERED_ERR  scsi_debug_opts)
   inj_recovered = 1; /* to reads and writes below */
 + else if (SCSI_DEBUG_OPT_TRANSPORT_ERR  scsi_debug_opts)
 + inj_transport = 1; /* to reads and writes below */
  }
  
   if (devip-wlun) {
 @@ -468,7 +479,11 @@ int scsi_debug_queuecommand(struct scsi_
   mk_sense_buffer(devip, RECOVERED_ERROR,
   THRESHOLD_EXCEEDED, 0);
   errsts = check_condition_result;
 - }
 + } else if (inj_transport  (0 == errsts)) {
 +mk_sense_buffer(devip, ABORTED_COMMAND,
 +TRANSPORT_PROBLEM, ACK_NAK_TO);
 +errsts = check_condition_result;
 +}
   break;
   case REPORT_LUNS:   /* mandatory, ignore unit attention */
   delay_override = 1;
 @@ -531,6 +546,9 @@ int scsi_debug_queuecommand(struct scsi_
   delay_override = 1;
   errsts = check_readiness(SCpnt, 0, devip);
   break;
 + case WRITE_BUFFER:
 + errsts = check_readiness(SCpnt, 1, devip);
 + break;
   default:
   if (SCSI_DEBUG_OPT_NOISE  scsi_debug_opts)
   printk(KERN_INFO scsi_debug: Opcode: 0x%x not 
 @@ -954,7 +972,9 @@ static int resp_inquiry(struct scsi_cmnd
   int alloc_len, n, ret;
  
   alloc_len = (cmd[3]  8) + cmd[4];
 - arr = kzalloc(SDEBUG_MAX_INQ_ARR_SZ, GFP_KERNEL);
 + arr = kzalloc(SDEBUG_MAX_INQ_ARR_SZ, GFP_ATOMIC);
 + if (! arr)
 + return DID_REQUEUE  16;
   if (devip-wlun)
   pq_pdt = 0x1e;  /* present, wlun */
   else if (scsi_debug_no_lun_0  (0 == devip-lun))
 @@ -1217,7 +1237,9 @@ static int

Re: [RFC] How to implement linux_block commands in scsi midlayer

2007-02-17 Thread Douglas Gilbert

Elias,
If you want to define a SCSI operation code for
internal use within the kernel, please make sure
that the byte isn't in the range 0 to 255 (inclusive).
Those ones are either t10 defined, reserved or vendor
specific for logical_unit or target use.

IOW don't do it!

Better would be to flag the request for internal use.

If you really want to tweak SCSI cdb's, try the last
byte (a.k.a. the control byte). Also consider that
we a broadening the application of the pass-through
code and other packet based protocols could be present.

Doug Gilbert


Elias Oltmanns wrote:
 Hi there,
 
 in 2.6.19 the request type REQ_TYPE_LINUX_BLOCK has been introduced.
 This is meant for generic block layer commands to the lower level
 drivers. I'd like to use this mechanism for a generic queue freezing
 and disk parking facility. The idea is to issue a command like
 REQ_LB_OP_PROTECT to the device driver associated to the queue so it
 can do about it what ever it sees fit. On command completion, the
 block layer then stops the queue until the unfreeze command is passed
 in. The IDLE IMMEDIATE command in recent ATA specs provides an unload
 disk heads feature which I'd like to use when the generic block layer
 command is issued to an ATA device.
 
 Since ATA is implemented as a subsystem of the scsi subsystem, I
 thought it would be best to add an scsi_cmnd opcode LINUX_BLOCK_CMD to
 include/scsi/scsi.h and deal with commands of this type very much like
 block_pc commands. The difference between these two types is that when
 LINUX_BLOCK_CMD commands are taken off the queue, it is dealt with by
 a special function of the midlayer to see if there is something to be
 done about it regardless of the lld associated with the device in
 question, and then the very same command is passed on to the low level
 driver to give it a chance to do the more specific stuff.
 
 In my particular case of a generic disk protect command, the midlayer
 would be responsible for setting sdev_state to SDEV_BLOCK and the ATA
 subsystem would issue the actual park command.
 
 The patch attached is a first attempt of a generic implementation of
 LINUX_BLOCK commands into the scsi midlayer. It probably doesn't apply
 cleanly to 2.6.19 as I've just extracted it from my disk parking
 branch, so it mainly serves as an example to comment on. Please let me
 know what you think about this approach and whether I should post a
 seperate patch for official integration into main line or whether it
 would be sufficient to leave it a part of the disk parking patch to be
 submitted later on.
 
 Regards,
 
 Elias
 ---
  drivers/ata/libata-scsi.c |   39 ++-
  drivers/scsi/scsi_lib.c   |   50 
 +
  include/linux/blkdev.h|1 +
  include/scsi/scsi.h   |1 +
  4 files changed, 86 insertions(+), 5 deletions(-)
  drivers/scsi/scsi.c   |3 ++-
 
 diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
 index a8acf71..6f1c351 100644
 --- a/drivers/ata/libata-scsi.c
 +++ b/drivers/ata/libata-scsi.c
 @@ -2558,6 +2558,41 @@ static struct ata_device * ata_find_dev(
   return NULL;
  }
  
 +/**
 + *   ata_scsi_linux_block - handling of generic block layer commands
 + *   @dev: ATA device to which the command is addressed
 + *   @cmd: SCSI command to execute
 + *   @done: SCSI command completion function
 + *
 + *   This function checks to see if we recognise the generic block layer
 + *   command and should do anything about it. If we don't know the command,
 + *   we indicate this in a sense response. However, we should fail
 + *   gracefully since the midlayer might handle this command appropriately
 + *   anyway, even without low level intervention.
 + *
 + *   LOCKING:
 + *   spin_lock_irqsave(host lock)
 + *
 + *   RETURNS:
 + *   Zero on success, non-zero on failure.
 + */
 +
 +static int ata_scsi_linux_block(struct ata_device *dev, struct scsi_cmnd 
 *cmd,
 + void (*done)(struct scsi_cmnd *))
 +{
 + struct request *req = cmd-request;
 + int ret = 0;
 +
 + switch (req-cmd[0]) {
 + default:
 + ata_scsi_set_sense(cmd, ILLEGAL_REQUEST, 0x20, 0x0);
 + /* Invalid command operation code */
 + done(cmd);
 + break;
 + }
 + return ret;
 +}
 +
  static struct ata_device * __ata_scsi_find_dev(struct ata_port *ap,
   const struct scsi_device *scsidev)
  {
 @@ -2856,7 +2891,9 @@ static inline int __ata_scsi_queuecmd(st
  {
   int rc = 0;
  
 - if (dev-class == ATA_DEV_ATA) {
 + if (cmd-cmnd[0] == LINUX_BLOCK_CMD)
 + rc = ata_scsi_linux_block(dev, cmd, done);
 + else if (dev-class == ATA_DEV_ATA) {
   ata_xlat_func_t xlat_func = ata_get_xlat_func(dev,
 cmd-cmnd[0]);
  
 diff --git a/drivers/scsi/scsi_lib.c

Re: [Bug 7994] New: sleeping function called from invalid context at mm/slab.c:3034

2007-02-16 Thread Douglas Gilbert

Andrew,
The patch that I sent, shown at the end of this post,
is incomplete as it doesn't check the return value
from kzalloc(..., GFP_ATOMIC).

As I suspected this bug has been exposed before: Jens
reported this problem in early January.

A more complete patch, with some other changes, was
posted 6 weeks ago:
http://marc.theaimsgroup.com/?l=linux-scsim=116797354920256w=2

I'm not sure if this patch is in the works or
not.

Doug Gilbert


Douglas Gilbert wrote:
 James Bottomley wrote:
 On Mon, 2007-02-12 at 20:06 -0800, Andrew Morton wrote:
 This is fixed in mainline and I expect that the fix is also lined up
 for
 2.6.20.1. (?)
 It's definitely in mainline.  I've cc'd Doug Gilbert, the scsi_debug
 maintainer to assess what should be done for 2.6.20.1
 
 James,
 I thought this had been addressed but I can't find a
 trail on my laptop. A minimal patch is attached.
 
 
 ChangeLog:
- Use GFP_ATOMIC for allocations that can be called
  from the queuecommand() entry point
 
 Signed-off-by: Douglas Gilbert [EMAIL PROTECTED]
 
 Doug Gilbert
 
 
 
 
 
 
 
 --- linux/drivers/scsi/scsi_debug.c   2006-11-30 07:00:01.0 -0800
 +++ linux/drivers/scsi/scsi_debug.c2620atom   2007-02-13 06:43:28.0 
 -0800
 @@ -954,7 +954,7 @@
   int alloc_len, n, ret;
  
   alloc_len = (cmd[3]  8) + cmd[4];
 - arr = kzalloc(SDEBUG_MAX_INQ_ARR_SZ, GFP_KERNEL);
 + arr = kzalloc(SDEBUG_MAX_INQ_ARR_SZ, GFP_ATOMIC);
   if (devip-wlun)
   pq_pdt = 0x1e;  /* present, wlun */
   else if (scsi_debug_no_lun_0  (0 == devip-lun))
 @@ -1217,7 +1217,7 @@
   alen = ((cmd[6]  24) + (cmd[7]  16) + (cmd[8]  8)
   + cmd[9]);
  
 - arr = kzalloc(SDEBUG_MAX_TGTPGS_ARR_SZ, GFP_KERNEL);
 + arr = kzalloc(SDEBUG_MAX_TGTPGS_ARR_SZ, GFP_ATOMIC);
   /*
* EVPD page 0x88 states we have two ports, one
* real and a fake port with no device connected.
 @@ -2044,7 +2044,7 @@
   }
   }
   if (NULL == open_devip) { /* try and make a new one */
 - open_devip = kzalloc(sizeof(*open_devip),GFP_KERNEL);
 + open_devip = kzalloc(sizeof(*open_devip),GFP_ATOMIC);
   if (NULL == open_devip) {
   printk(KERN_ERR %s: out of memory at line %d\n,
   __FUNCTION__, __LINE__);

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: sg version 4 tools

2007-02-15 Thread Douglas Gilbert

FUJITA Tomonori wrote:
 I created a git tree for makeshift sg version 4 tools:
 
 http://www.kernel.org/git/?p=linux/kernel/git/tomo/sgv4-tools.git;a=summary
 
 # not synchronized yet.
 
 The interface has changed continuously (and will do). After mainline
 inclusion, Doug's sg tools support sg v4, I think. Until then, I put
 tools that I use for sg v4 development.
 
 Currently, there is only one tool, sgv4_dd, which can read/write
 from/to a device via the bsg interface (both ioctl and the read/write
 interfaces are supported).
 
 Here are some examples:
 
 # ./sgv4_dd -i /dev/sdb -o /dev/null --count 2
 succeeded (read/write interface)
 
 # ./sgv4_dd -i /dev/sdb -o /dev/null --count 2 --sgio
 succeeded (SG_IO)
 
 # ./sgv4_dd -i /dev/zero -o /dev/sdb --count 3 --sgio
 succeeded (SG_IO)
 
 # ./sgv4_dd -i /dev/zero -o /dev/sdb --count 3
 succeeded (read/write interface)

Tomo,
Just a few points.

While the sgv4_dd command line interface (cli) looks
sensible, it diverges from the dd command (which is
non-unix like but reasonably fit for service for the
function that dd performs). So even though the Unix dd
command syntax takes a while to get used to, other testers
will be most likely to be comfortable with existing dd syntax.


Of the 41 utilities in (the main directory of) sg3_utils,
29 are ported to FreeBSD and Windows. This is done by
putting a generic pass-through layer between those 29
utilities and the OS specific pass-throughs ***. The
remaining 12 utilities are either:
  a) linux specific (e.g. sg_reset and sg_map26)
  b) or a bit too complicated due to other system
 calls (e.g. sg_dd) to convert
  c) both a) and b) (e.g. sgm_dd)
The generic pass through layer is defined with bi-directional
in mind.

It also should be relatively easy to allow for two linux
specific pass-throughs (i.e. sgv3 and sgv4) so that
the common 29 utilities just work on either pass-through
(by compile or run time switch).


In summary, I don't think that there needs to be a
sg4_utils. As you suggest, sgv4_dd can be incorporated
into the existing sg3_utils at a convenient time.
sg v4 represents an alternate interface for a linux
pass-through and the bulk of sg3_utils already supports
4 pass-throughs via a common code base. [The four are
linux (sg v3), FreeBSD, Tru64, Windows (from NT forward).]


*** smartmontools takes the same approach and it supports
several pass-thoughs for Windows.

Doug Gilbert



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bind bsg to request_queue instead of gendisk

2007-02-14 Thread Douglas Gilbert

Jeff Garzik wrote:
 On Wed, Feb 14, 2007 at 02:53:31AM +0900, FUJITA Tomonori wrote:
 It seems that it would be better to bind bsg devices to request_queue
 instead of gendisk. This enables any objects to define own
 request_handler and create own bsg device (under sysfs).

 Possible enhancements:

 - I removed gendisk but it would be better for objects having gendisk
 to keep it for nice features like disk stats.

 - Objects that wants to use bsg need to setup a request_queue. Maybe
 wrapper functions to setup a request_queue for them would be useful.

 This patch was tested only with disk drivers.


 Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED]
 ---
  block/bsg.c|   37 +
 
 What is this patch against?  scsi-misc?
 
 I certainly like the bsg solution, but block/bsg.c does not exist in my
 vanilla linux-2.6.git tree :)

www.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git

branch: bsg

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bug 7994] New: sleeping function called from invalid context at mm/slab.c:3034

2007-02-13 Thread Douglas Gilbert

James Bottomley wrote:
 On Mon, 2007-02-12 at 20:06 -0800, Andrew Morton wrote:
 This is fixed in mainline and I expect that the fix is also lined up
 for
 2.6.20.1. (?)
 
 It's definitely in mainline.  I've cc'd Doug Gilbert, the scsi_debug
 maintainer to assess what should be done for 2.6.20.1

James,
I thought this had been addressed but I can't find a
trail on my laptop. A minimal patch is attached.


ChangeLog:
   - Use GFP_ATOMIC for allocations that can be called
 from the queuecommand() entry point

Signed-off-by: Douglas Gilbert [EMAIL PROTECTED]

Doug Gilbert



--- linux/drivers/scsi/scsi_debug.c	2006-11-30 07:00:01.0 -0800
+++ linux/drivers/scsi/scsi_debug.c2620atom	2007-02-13 06:43:28.0 -0800
@@ -954,7 +954,7 @@
 	int alloc_len, n, ret;
 
 	alloc_len = (cmd[3]  8) + cmd[4];
-	arr = kzalloc(SDEBUG_MAX_INQ_ARR_SZ, GFP_KERNEL);
+	arr = kzalloc(SDEBUG_MAX_INQ_ARR_SZ, GFP_ATOMIC);
 	if (devip-wlun)
 		pq_pdt = 0x1e;	/* present, wlun */
 	else if (scsi_debug_no_lun_0  (0 == devip-lun))
@@ -1217,7 +1217,7 @@
 	alen = ((cmd[6]  24) + (cmd[7]  16) + (cmd[8]  8)
 		+ cmd[9]);
 
-	arr = kzalloc(SDEBUG_MAX_TGTPGS_ARR_SZ, GFP_KERNEL);
+	arr = kzalloc(SDEBUG_MAX_TGTPGS_ARR_SZ, GFP_ATOMIC);
 	/*
 	 * EVPD page 0x88 states we have two ports, one
 	 * real and a fake port with no device connected.
@@ -2044,7 +2044,7 @@
 		}
 	}
 	if (NULL == open_devip) { /* try and make a new one */
-		open_devip = kzalloc(sizeof(*open_devip),GFP_KERNEL);
+		open_devip = kzalloc(sizeof(*open_devip),GFP_ATOMIC);
 		if (NULL == open_devip) {
 			printk(KERN_ERR %s: out of memory at line %d\n,
 __FUNCTION__, __LINE__);

Re: Random scsi disk disappearing

2007-02-07 Thread Douglas Gilbert

Randy Dunlap wrote:
 [lkml dropped]  [old thread]
 
 On Fri, 18 Aug 2006 11:11:39 +0200 Andreas Herrmann wrote:
 
 On 18.08.2006 00:33 Stefan Richter [EMAIL PROTECTED] wrote:
 Andreas Herrmann wrote:
 Anyone interested in a script to conveniently interpret or change the
 SCSI logging level? Such a script (scsi_logging_level) exists in the
 s390-tools package (version 1.5.3).
 That would be very welcome.
 
 Hi Doug,
 
 Did you give any thought to adding this script (or a current version
 of it, from
 http://www-128.ibm.com/developerworks/linux/linux390/s390-tools-1.5.4.html)
 to sg3-utils?  or would you give it some thought?
 
 I think that's a better solution than adding to the kernel tree
 (and better than getting it from developerworks :).
 
 Thanks.

Randy,
The recently released sg3_utils version 1.23 contains
a scripts directory. The files in there are:
  README
  sas_disk_blink
  scsi_logging_level
  scsi_mandat
  scsi_readcap
  scsi_ready
  scsi_satl
  scsi_start
  scsi_stop
  scsi_temperature

Does one look familiar?
If there is a later version, I can put it in sg3_utils-1.24 .

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Writing performance problem with SAS1068

2007-02-06 Thread Douglas Gilbert

Bernardo Innocenti wrote:
 Hello,
 
 I've stumbled onto a strange performance problem on a new server:
 reading from disks is fast (70-80MB/s), but writing is extremely
 slow (13-15MB/s).  I've measured it like this:
 
 dd if=/dev/zero of=/dev/sdd bs=4096 count=65536 conv=fdatasync
 65536+0 records in
 65536+0 records out
 268435456 bytes (268 MB) copied, 17.7004 seconds, 15.2 MB/s

# dd if=/dev/zero of=/dev/sdj bs=4096 count=65536 conv=fdatasync
65536+0 records in
65536+0 records out
268435456 bytes (268 MB) copied, 2.24953 seconds, 119 MB/s

# dd if=/dev/zero of=/dev/sdd bs=4096 count=65536 conv=fdatasync
65536+0 records in
65536+0 records out
268435456 bytes (268 MB) copied, 2.3246 seconds, 115 MB/s

Both /dev/sdj and /dev/sdd connect via an expander to the same
SAS disk. /dev/sdj is via the LT aic94xx driver and a PCI-X HBA.
/dev/sdd is via the mptsas driver and a SAS1068 (PCIe) based HBA.
The kernel version is 2.6.20-rc5.

Looks good to me.

You may like to check that Write Cache Enable is on with:
'sdparm --get=WCE /dev/sdd'.

Doug Gilbert

 *but*: if I rebuild the kernel and change CONFIG_FUSION_MAX_SGE
 from 40 (Fedora's default) to 128 (maximum value), it suddenly
 gets much faster: 31MB/s!
 
 Looks very much like an interrupt problem to me.  Maybe
 increasing the scatter gather mitigates the problem of
 missing completion notifications.
 
 Evidence:
 
 Exhibit A: custom kernel config for 2.6.18-1.2257.fc5.bernie
 http://www.codewiz.org/helium_logs/config
 
 Exhibit B: dmesg output from said kernel
 http://www.codewiz.org/helium_logs/dmesg
 
 Exhibit C: misc proc files, and all that
 http://www.codewiz.org/helium_logs/
 
 Exhibit D: motherboard and chipset specification
 http://www.supermicro.com/products/motherboard/Xeon3000/3010/PDSME+.cfm
 
 
 Circumstantial evidence:
 
 - Seems to affect just the LSI SAS1068 PCI-X controller.
   The on-board AHCI controller writes very fast (60MB/s)
 
 - I've seen a very similar writing bottleneck with a
   Promise TX4 SATA controller (not PCI-X) on a server with
   a similar motherboard (Supermicro with Mukilteo 3000).
 
 - Passing mpt_msi_enable=1 doesn't change anything
 
 - FreeBSD 6.2 is even slower: writes at 7MB/s
 
 - OpenSolaris is much, much slower... less than 1MB/s.
 
 - Windows Vista (rc something) writes at 90MB/s.  Too
   fast to believe, maybe dd from Cygwin is misbehaving.
 

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SG_IO weirdness

2007-02-05 Thread Douglas Gilbert

Cameron, Steve wrote:
 I noticed that when I do two SG_IO ioctls to a target
 device (say, tape drive, disk drive, whatever) in which
 the first request is well formed (e.g. an inquiry) and the 
 second one has a malformed CDB, such that it gets check condition 
 with sense key == 5 (ILLEGAL REQUEST), the data buffer returned for 
 the second malformed SG_IO request is filled out with the same 
 data as was returned for the first successful command (e.g. the 
 same inquiry data again.)  I'm using separate data buffers for
 the two commands, and memsetting them to zero before calling
 ioctl().  I don't think this data is coming from the device,
 as it happens with every device I've tried.
 
 Is that normal?  Seems like for a malformed request, the 
 data buffer should not be transferred at all, much less
 transferred with contents of a prior request's data buffer.
 
 Kernel is 2.6.18 from kernel.org.

Steve,
Even though the SCSI status is CHECK CONDITION, the data-in
buffer may still be transferred. One obvious example
is a READ command when the sense key is RECOVERED ERROR.

The sg driver and I suspect the block layer SG_IO do
not check the SCSI status to determine whether or not
to transfer the data-in buffer (or where it would have
been DMA-ed to if the command worked) back to user space.
If it was _direct_ IO then the block layer SG_IO and the
sg driver would have no control over the data-in transfer
(apart from setting it up).

Both the sg driver and the block layer SG_IO could check
the resid field which a LLD should set after a DMA
(especially inbound). However LLDs are not compelled to
set resid properly.

So a few questions:
 - block layer SG_IO, the sg driver or both?
 - indirect IO (i.e. O_DIRECT not set)?
 - did the offending process have superuser permissions?
 - did the resid field indicate a short data-in transfer?

 The two requests were done from the same process, I haven't
 tried two separate processes to see if one process could
 by this method access another process's data.  I did try
 using two devices, so the first well formed command went 
 to one device, and the 2nd, malformed command went to another
 device.  In that case, I didn't get the same buffer back again,
 but garbage. (some recognizeable strings, en_US was in there...)
 
 Is this a problem, or is this a matter of just don't do that.?

As long as the SCSI status and sense buffer are conveyed
back properly _and_ this is only observed when the
process has superuser permissions, then I wouldn't
regard it as serious. Others may disagree.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SG_IO weirdness

2007-02-05 Thread Douglas Gilbert

Cameron, Steve wrote:
 Steve,
 Even though the SCSI status is CHECK CONDITION, the data-in
 buffer may still be transferred. One obvious example
 is a READ command when the sense key is RECOVERED ERROR.
 
 Yep, of course.
 
 The sg driver and I suspect the block layer SG_IO do
 not check the SCSI status to determine whether or not
 to transfer the data-in buffer (or where it would have
 been DMA-ed to if the command worked) back to user space.
 If it was _direct_ IO then the block layer SG_IO and the
 sg driver would have no control over the data-in transfer
 (apart from setting it up).

 Both the sg driver and the block layer SG_IO could check
 the resid field which a LLD should set after a DMA
 (especially inbound). However LLDs are not compelled to
 set resid properly.

 So a few questions:
 - block layer SG_IO, the sg driver or both?
 
 sg driver.
 
 - indirect IO (i.e. O_DIRECT not set)?
 
 indirect IO, O_DIRECT was not set.
 
 - did the offending process have superuser permissions?
 Yes.
 
 - did the resid field indicate a short data-in transfer?
 
 resid == 64, the requested buffer was 1088 bytes.
 (If I interpret that right, it means that all but 64 
 bytes were transferred, that is, 1024 bytes were 
 transferred?  Odd, considering the CDB was nonsense.)

Steve,
From memory, between SPC-2 and SPC-3 the INQUIRY allocation
length field went from 8 bits to 16 bits. If you do the
above calculation modulo 256 it comes out correct :-)

The moral here is don't set INQUIRY lengths  252
unless the target can handle it. There is no point
anyway for a standard INQUIRY (EVPD=0, CmdDt is
obsolete). With VPD pages you can do a double fetch,
the first one 4 bytes long to pick up page length
field.
But then again you said the cdb was nonsense.


Now it is still a bit fuzzy because there is the
allocation length field in some cdbs and the dxfer_len
given to sg_io_hdr. I would think that the LLD
should concentrate on the latter and set resid
accordingly. That makes me wonder about the LLD
involved (the sg driver just passes resid through).

 As long as the SCSI status and sense buffer are conveyed
 back properly _and_ this is only observed when the
 process has superuser permissions, then I wouldn't
 regard it as serious. Others may disagree.
 
 I haven't tried it as non-superuser.  (And couldn't,
 unless I chmod'ed /dev/sg* )

The sg driver zeros out the scatter gather elements
for non-superusers.
chmod'ing is not always needed, for example a non-superuser
may well have permissions on a USB cd/dvd drive (including
the sg device node) in some distributions.

Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Douglas Gilbert

Alan wrote:
 The interesting point of this question is about the typically pattern of 
 IO errors. On a read, it is safe to assume that you will have issues 
 with some bounded numbers of adjacent sectors.
 
 Which in theory you can get by asking the drive for the real sector size
 from the ATA7 info. (We ought to dig this out more as its relevant for
 partition layout too).
 
 I really like the idea of being able to set this kind of policy on a per 
 drive instance since what you want here will change depending on what 
 your system requirements are, what the system is trying to do (i.e., 
 when trying to recover a failing but not dead yet disk, IO errors should 
 be as quick as possible and we should choose an IO scheduler that does 
 not combine IO's).
 
 That seems to be arguing for a bounded live time including retry run
 time for a command. That's also more intuitive for real time work and for
 end user setup. Either work or fail within n seconds

Which is more or less the streaming feature set in recent
ATA standards. [Alas, streaming and NCQ/TCQ can't be done
with the same access.] SCSI has its Read Write Error Recovery
mode page which doesn't have timeouts but does have Read
and Write Retry Counts amongst other fields that control
the amount (and indirectly the time) of attempted error
recovery.

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Announce] sg3_utils-1.23 available

2007-02-01 Thread Douglas Gilbert

sg3_utils is a package of command line utilities for sending
SCSI (and some ATA) commands to devices. This package targets
the linux kernel (lk) 2.6 and lk 2.4 series. In the lk 2.6
series these utilities (except sgp_dd) can be used with any
devices that support the SG_IO ioctl. Ported to FreeBSD,
Tru64 and Windows (cygwin and mingw).

This version adds sg_read_buffer and sg_write_buffer utilities.
Cleans up command line interface of older utilities and all
man pages have been reworked. Package synchronized with SPC-4
revision 8 and SBC-3 revision 8. Copy of ChangeLog below.

For an overview of sg3_utils and downloads see this page:
http://www.torque.net/sg/sg3_utils.html
The sg_dd utility has its own page at:
http://www.torque.net/sg/sg_dd.html
The SG_IO ioctl is discussed at:
http://www.torque.net/sg/sg_io.html
A full changelog can be found at:
http://www.torque.net/sg/p/sg3_utils.CHANGELOG

A release announcement has been sent to freshmeat.net .

Top of Changelog:
Changelog for sg3_utils-1.23 [20070131]
  - sg_read_buffer: new utility
  - sg_write_buffer: new utility
  - sg_opcodes, sg_senddiag, sg_logs, sg_modes, sg_start, sg_inq,
sg_turs, sg_readcap, sg_rbuf: add getopt_long() based cli;
old and new cli selectable, new getopt_long cli is default
  - scripts: new subdirectory containing some bash scripts
- add scripts/README file
  - sg_reassign: add '--hex' option for grown and primary lists
  - sg_rtpg: add '--raw' option
  - sg_lib.h, sg_cmds_basic.h + sg_cmds_extra.h: add C++
'extern C ' wrappers
- cleanup C code so it will compile as C++
  - sg_lib: sync with spc4r08
- include inttypes.h, use PRId64 instead of %lld form
- fix sg_get_sense_str() when empty sense buffer
  - win32 port: add Makefile.mingw + related support for MinGW
  - sg_cmds_extra: add sg_ll_read_buffer() and sg_ll_write_buffer()
  - sg_dd, sgp_dd, sgm_dd, sg_read: use lseek64() instead of llseek.c
  - sgm_dd: accept coe=n for interworking with sg_dd
  - sg_rdac: fix on non-linux ports
  - sg_ses: fix spurious warning in additional element status page
- '-rr' option outputs a diagnostic page in binary to stdout
  - sg_opcodes: add command timeout descriptor support (spc4r08)
- change linux specific pass through to generic pass through
  - sg_logs: add 'name=value' decoding for SAS specific lpage
  - examples+utils subdirectories: remove symlinks
  - synchronize man pages with usage messages
  - sg3_utils.spec: rework

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] RESEND: SCSI, libata: add support for ATA_16 commands to libata ATAPI devices

2007-02-01 Thread Douglas Gilbert

James Bottomley wrote:
 On Thu, 2007-02-01 at 04:54 -0500, Jeff Garzik wrote:
 Agreed... but that doesn't make it the /right/ thing to do ;-)

 The logic behind the current code, which limits to the maximum size 
 allowed by an attached device on the port, is mainly to leverage the 
 SCSI layer as a filter for bad CDB lengths.

 IOW, it's called being lazy ;-)
 
 But you're requesting code changes in the SCSI layer because of this
 incorrect usage.  max_cdb is supposed to be the *host* limit.  The mid
 layer finds out and respects device limits separately from this.

To be more pedantic:
  actual_max_cdb = min(MAX_COMMAND_SIZE, host_limit)

Since the host is a bridge, that could be a limit on
near side (i.e. PCI (unlikely)) or the outer side (i.e.
transport initiator (port)). In modern HBAs the
host_limit is likely to be greater than 16, to allow
for advanced SBC and OSD commands. However currently
MAX_COMMAND_SIZE (defined in scsi/scsi_cmnd.h) is 16.

It is the ATAPI _transport_ that has the 12 byte cdb
limit *** (at least according to MMC-5 rev Annex A;
is S-ATAPI any better?).
Other MMC transports referred to in MMC-5 are
SPI, SBP(IEEE 1394) and USB mass storage; and no mention
is made of cdb length limits for them. Since ATAPI is
the dominant transport for cd/dvd drives, MMC doesn't
define any commands over 12 bytes in length, but both
SPC (which MMC should honour) and SSC-3 (think tape
drives, ATAPI connected) do.

My point is that the linux block layer and scsi mid
level should get out of the business of putting hard
limits place. Why?
Since kernel limits are at best necessary but not
sufficient, the upper layers still need to be able to
cope with errors associated with that limit.
So why have the limit?
Does the kernel do analysis to find out whether a USB
connected DVD drive has a USB to ATAPI bridge externally?
I don't think so. There is a role to fetch information
that may act as a guide when a ULD has a choice of commands
to build (e.g. sd deciding between READ(10) and READ(16)).
Let the cdb size bottleneck (or whatever) report an error
and upper layers that are impacted, including user space
programs, can act accordingly.

If the kernel really wants to offload complexity to the
user space, the kernel needs to get out of the business
of trying to foresee errors. It needs to get better at
coping with errors and if possible adapting its behaviour.


*** not the host nor the device

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Douglas Gilbert

Ric Wheeler wrote:
 
 
 Jeff Garzik wrote:
 Mark Lord wrote:
 Eric D. Mudama wrote:

 Actually, it's possibly worse, since each failure in libata will
 generate 3-4 retries.  With existing ATA error recovery in the
 drives, that's about 3 seconds per retry on average, or 12 seconds
 per failure.  Multiply that by the number of blocks past the error
 to complete the request..

 It really beats the alternative of a forced reboot
 due to, say, superblock I/O failing because it happened
 to get merged with an unrelated I/O which then failed..
 Etc..


 FWIW -- speaking generally -- I think there are inevitable areas where
 libata error handling combined with SCSI error handling results in
 suboptimal error handling.

 Just creating a list of this behavior should be handled this way,
 but in reality is handled in this silly way would be very helpful.
 
 I agree - Tejun has done a great job at giving us a great base. Next
 step is to get clarity on what the types of errors are and how to
 differentiate between them (and maybe how that would change by class of
 device?).
 

 Error handling is tough to get right, because the code is exercised so
 infrequently.  Tejun has actually done an above-average job here, by
 making device probe, hotplug and other exceptions go through the
 libata EH code, thereby exercising the EH code more than one might
 normally assume.

 Some errors in libata probably should not be retried more than once,
 when we have a definitive diagnosis.  Suggestions for improvements are
 welcome.

 Jeff
 
 One thing that we find really useful is to inject real errors into
 devices. Mark has some patches that let us inject media errors, we also
 bring back failed drives and run them through testing and occasionally
 get to use analyzers, etc to inject odd ball errors.

Ric,
Both ATA (ATA8-ACS) and SCSI (SBC-3) have recently added
command support to flag a block as uncorrectable. There
is no need to send bad long data to it and suppress the
disk's automatic re-allocation logic.

In the case of ATA it is the WRITE UNCORRECTABLE command.
In the case of SCSI it is the WR_UNCOR bit in the WRITE
LONG command.

It seems that due to SAT any useful capability in the ATA
command set will soon appear in the corresponding SCSI
command set, if it is not already there.

Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

SAS illegal toplogies [was Re: [PATCH 1/4 v2] libsas: Don't BUG when connecting two expanders via wide port]

2007-01-30 Thread Douglas Gilbert

Darrick J. Wong wrote:
 libsas: Don't BUG when connecting two expanders via wide port
 
 When a device is connected to an expander, the discovery process goes through
 sas_ex_discover_dev to figure out what's attached to the phy.  If it is the
 case that the phy being discovered happens to be the second phy of a wide link
 to an expander, that discover_dev function will incorrectly call
 sas_ex_discover_expander, which creates another sas_port and tries to attach 
 the
 other sas_phys to the new port, thus triggering a BUG.  The correct thing to 
 do is
 to check the other ex_phys of the expander to see if there's a sas_port for 
 this
 sas_phy, and attach the sas_phy to the existing sas_port.
 
 This is easily triggered if one enables the phys of a wide port between
 expanders one by one.
 
 This second version of the patch fixes a small regression in the case where
 all the phys show up at once and we accidentally try to attach to a port
 that hasn't been created yet.

Darrick,
Okay.

Now I'm wondering what the discovery algorithm in libsas
does if it finds truly illegal connections between expanders.
The spec defines what is illegal but says it is vendor specific
what will be done.

One approach is to use the SMP PHY CONTROL function to disable
the phy (or the phys at both ends of the illegal link). The
next trick is how to tell the user who just connected a cable
between expanders that you can't do that!. Tools like my
smp_discover could alert a user to a disabled phy but
without turning it back on (and causing the libsas discovery
algorithm another headache) my SMP utilities don't know what
it is connected to.

Another question is which link to disable. Imagine three
expanders interconnected with 3 links which is illegal.
Breaking any one link makes it legal, but which one
to break? Last seen, or perhaps the link which has
the largest SAS address sum ...

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Douglas Gilbert

Ric Wheeler wrote:
 
 
 Mark Lord wrote:
 
 Eric D. Mudama wrote:


 Actually, it's possibly worse, since each failure in libata will
 generate 3-4 retries.  With existing ATA error recovery in the
 drives, that's about 3 seconds per retry on average, or 12 seconds
 per failure.  Multiply that by the number of blocks past the error to
 complete the request..


 It really beats the alternative of a forced reboot
 due to, say, superblock I/O failing because it happened
 to get merged with an unrelated I/O which then failed..
 Etc..

 Definitely an improvement.

 The number of retries is an entirely separate issue.
 If we really care about it, then we should fix SD_MAX_RETRIES.

 The current value of 5 is *way* too high.  It should be zero or one.

 Cheers

 I think that drives retry enough, we should leave retry at zero for
 normal (non-removable) drives. Should this  be a policy we can set like
 we do with NCQ queue depth via /sys ?

The transport might also want a say. I see ABORTED COMMAND
errors often enough with SAS (e.g. due to expander congestion)
to warrant at least one retry (which works in my testing).
SATA disks behind SAS infrastructure would also be
susceptible to the same random failures.

Transport Layer Retries (TLR) in SAS should remove this class
of transport errors but only SAS tape drives support TLR as
far as I know.

Doug Gilbert

 We need to be able to layer things like MD on top of normal drive errors
 in a way that will produce a system that provides reasonable response
 time despite any possible IO error on a single component.  Another case
 that we end up doing on a regular basis is drive recovery. Errors need
 to be limited in scope to just the impacted area and dispatched up to
 the application layer as quickly as we can so that you don't spend days
 watching a copy of  huge drive (think 750GB or more) ;-)
 
 ric


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] Roll-up of libsas and aic94xx patches

2007-01-26 Thread Douglas Gilbert

Darrick J. Wong wrote:
 Hi all,
 
 This is a roll-up of all of my uncommitted patches against libsas
 and aic94xx to date.  The first patch features an important fix for an
 incorrect port deformation after a phy reset event.  The next two
 patches in this set complete the reorganization of the
 sas_rphy_{delete,free} calls after errors during discovery.  The next
 two patches amend the SAS error handler to be able to handle scsi_cmnds
 that have completed successfully but with a failure code, first by
 trying START UNIT if the disk is not spinning, second by trying to
 reset the device, and finally offlining the device if nothing works.

Darrick,
The reset the device is a bit vague. Would that
be a LU reset (task management function) or a
hard reset? If the latter then it will cause
collateral damage if the target contains multiple
logical units (i.e. it will reset all of them,
not just the one failing to spin up).

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

lsscsi-0.19 released

2007-01-25 Thread Douglas Gilbert

lsscsi is a utility that uses sysfs in linux 2.6 series kernels
to list information about SCSI devices and SCSI hosts. Both a
compact format (default) which is one line per device and a
classic format (like the output of 'cat /proc/scsi/scsi') are
supported.

Version 0.19 is available at
http://www.torque.net/scsi/lsscsi.html
More information can be found on that page including examples
and a Download section for tarballs, rpm and deb packages.
This version adds transport specific information.

ChangeLog:
Version 0.19 2007/1/25
  - add transport information (target + initiator)
- start with FC, SAS, SPI, iSCSI and SBP
- alter ISCSI for 2.6.20 changes
  - SAS fix for lk 2.6.20 (SYSFS_DEPRECATED=n)
  - enhance host name search when proc_name is NULL
  - implement filter option for '--hosts'
- accept 'hostn' as first item in filter to mean host n
  - output more host attributes when '-Hll' given
  - add '--list' (or '-L') option output attribute=value
entries, one per line


Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux Virtual SCSI HBAs and Virtual disks

2007-01-23 Thread Douglas Gilbert

Aboo Valappil wrote:
 Hi Stefan Richter,
 
 Thanks everyone for their advice on this. As per your advice, I did the
 following when the last user space target serving the scsi_host quits,
 the queue command will do the following on the new commands coming through.
 
sc-result = DID_NO_CONNECT  16;
sc-resid = sc-request_bufflen;
set_sensedata_commfailure(sc);  -
 This sets the sense buffer with Device Not ready/Logical Unit
 Commincation failure.
done(sc);
 
 The scsi_host will remain in the kernel. Let the EH thread handle the
 queued commands (If any). If the user target wants to reconnects to the
 same scsi_host, it can do so (Just re-run the user space target again
 with same command line paramters).  This connection from newly started
 target will make the HBA healthy again and start serving IO.
 
 I implemented a new IOCTL to remove  this  scsi_host  if the user
 process really needs to.  This removal  will first  finish all the SCSI
 commands (With the above status results) queued on the scsi_host  (If at
 all) and then remove the scsi_host.  Also the module unload will delete
 all the scsi_hosts created after finishing all the commands queued with
 the above status and sense information.
 
 I also implemented passing of sense code information from user space to
 sense_buffer. A little more work needs to be done on this.
 Also, I need to make sure that all the locking used inside is correctly
 implemented to prevent dead locks and improve efficiency.
 
 The new version is available http://vscsihba.aboo.org/vscsihbav204.gz

A few observations from testing this version:

# ./start_target.sh id=3 -files ../../zz_lun0 -v
# lsscsi
[0:0:0:0]diskLinuxscsi_debug   0004  /dev/sda
[1:0:0:0]diskVirtualH VHD  0 /dev/sdb

So id=3 doesn't look the target identifier. If not, what
is it?

Here is an attempt to fetch the Read Write Error Recovery
mode page:
# sdparm -p rw -vv /dev/sg1
inquiry cdb: 12 00 00 00 24 00
/dev/sg1: VirtualH  VHD   0
mode sense (10) cdb: 5a 00 01 00 00 00 00 00 08 00
mode sense (10): Probably uninitialized data.
  Try to view as SCSI-1 non-extended sense:
  AdValid=0  Error class=0  Error code=0

 Read write error recovery mode page [0x1] failed


That implies a sense buffer full of zeroes. The debug
output from start_target.sh associated with that attempt:

SCSI cmd Lun=00 id=2D CDB=12 00 00 00 24 00 00 00 08 00 00 00 00 00 00 00
SCSI cmd Lun=00 id=2D completed, status=0
SCSI cmd Lun=00 id=2E CDB=5A 00 01 00 00 00 00 00 08 00 00 00 00 00 00 00
SCSI cmd Lun=00 id=2E completed, status=2
SCSI cmd Lun=00 id=2F CDB=03 00 00 00 FC 00 00 00 08 00 00 00 00 00 00 00
SCSI cmd Lun=00 id=2F completed, status=0
SCSI cmd Lun=00 id=30 CDB=00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00
SCSI cmd Lun=00 id=30 completed, status=0

So that is an INQUIRY [expected], MODE SENSE(10) [expected],
REQUEST SENSE [what, no autosense??] and TEST UNIT READY
[ah oh, error recovery??] sequence.

Perhaps you could examine the way scsi_debug (or most
other LLDs) does autosense. This modern technique (used
for about the last 12 years) relieves the scsi midlevel
of having to send a follow up REQUEST SENSE.

It would be easier to read those SCSI commands in the
debug output if they were trimmed to their actual lengths
(e.g. the INQUIRY is 12 00 00 00 24 00).

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 1/6] bidi support: request dma_data_direction

2007-01-22 Thread Douglas Gilbert

Benny Halevy wrote:
 Douglas Gilbert wrote:
 Boaz Harrosh wrote:
 - Introduce a new enum dma_data_direction data_dir member in struct request.
   and remove the RW bit from request-cmd_flag
 - Add new API to query request direction.
 - Adjust existing API and implementation.
 - Cleanup wrong use of DMA_BIDIRECTIONAL

Perhaps the right use of DMA_BIRECTIONAL needs to be
defined.

Could it be used with a XDWRITE(10) SCSI command
defined in sbc3r07.pdf at http://www.t10.org ? I suspect
using two scatter gather lists would be a better approach.

 - Introduce new blk_rq_init_unqueued_req() and use it in places ad-hoc
   requests were used and bzero'ed.
 With a bi-directional transfer is it always unambiguous
 which transfer occurs first (or could they occur at
 the same time)?
 
 The bidi transfers can occur in any order and in parallel.

Then it is not sufficient for modern SCSI transports in which
certain bidirectional commands (probably most) have a well
defined order.

So DMA_BIDIRECTIONAL looks PCI specific and it may have
been a mistake to replace other subsystem's direction flags
with it. RDMA might be an interesting case.

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 1/6] bidi support: request dma_data_direction

2007-01-21 Thread Douglas Gilbert

Boaz Harrosh wrote:
 - Introduce a new enum dma_data_direction data_dir member in struct request.
   and remove the RW bit from request-cmd_flag
 - Add new API to query request direction.
 - Adjust existing API and implementation.
 - Cleanup wrong use of DMA_BIDIRECTIONAL
 - Introduce new blk_rq_init_unqueued_req() and use it in places ad-hoc
   requests were used and bzero'ed.

With a bi-directional transfer is it always unambiguous
which transfer occurs first (or could they occur at
the same time)?

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: no utility / method to show association between HBA non-sg BLOCK (scsi) devices - register_blkdev()

2007-01-17 Thread Douglas Gilbert

Thayne Harmon wrote:
 On Thu, Jan 11, 2007 at  1:15 PM, in message [EMAIL PROTECTED],
 Douglas Gilbert [EMAIL PROTECTED] wrote: 
 Thayne Harmon wrote:
 Gentlemen,

 hwinfo, lshal, sysfs do not show the relationship for non- sg BLOCK devices 
 with there 
 associated Host Bus Adapter.
 All devices (i.e. logical units) have a 4 element tuple
 associated with them and the first element is the host
 number. A HBA contains one or more hosts. Then you can
 datamine in /sys/class/scsi_host/hostn for whatever
 information you want.

 Do you know of a utility or method that can show this?
 May I suggest lsscsi. That won't help you in the lk 2.4
 series and earlier though There are other methods by
 which the sg device corresponding to a non- sg block
 device (e.g. /dev/sdc) can be found.
 
 [context - Linux testserver 2.6.16.21-0.8-smp i586]
 
 There is no corresponding sg device. The device file is
 /dev/cciss/c0d1.

Ok, I'm not familiar with the cciss driver. It looks like
it lives outside the linux scsi subsystem but according
to Documentation/cciss.txt it can subsequently engage
the scsi subsystem??

If it is outside the scsi subsystem then it doesn't
get corresponding sg devices. However as part of the
block subsystem it might accept the SG_IO ioctl (if
it accepts SCSI commands and it is implemented).

 I tried lsscsi, however it would not print out the non-sg block devices.
 
 I have attached the output of tree /sys and the output of lsscsi and uname.
 One can search for cciss to find the devices and the driver.
 I still cannot see a relationship.

snip sysfs dump

 [0:0:0:0]storage COMPAQ   MSA1000  4.32  -   
 [0:0:0:3]diskCOMPAQ   MSA1000 VOLUME   4.32  /dev/sda
 [0:0:0:4]diskCOMPAQ   MSA1000 VOLUME   4.32  /dev/sdb
 [0:0:0:5]diskCOMPAQ   MSA1000 VOLUME   4.32  /dev/sdc
 [0:0:0:6]diskCOMPAQ   MSA1000 VOLUME   4.32  /dev/sdd
 [0:0:0:7]diskCOMPAQ   MSA1000 VOLUME   4.32  /dev/sde
 [1:0:0:0]storage COMPAQ   MSA1000  4.32  -   
 [1:0:0:3]diskCOMPAQ   MSA1000 VOLUME   4.32  /dev/sdf
 [1:0:0:4]diskCOMPAQ   MSA1000 VOLUME   4.32  /dev/sdg
 [1:0:0:5]diskCOMPAQ   MSA1000 VOLUME   4.32  /dev/sdh
 [1:0:0:6]diskCOMPAQ   MSA1000 VOLUME   4.32  /dev/sdi
 [1:0:0:7]diskCOMPAQ   MSA1000 VOLUME   4.32  /dev/sdj

Well this looks like output from lsscsi. And those devices look
like they could be associated with cciss, especially the
compaq storage devices. These devices should have corresponding
sg device nodes. Try lsscsi -g.

Still a bit unclear as hosts 0 and 1 are Fibre Channel
judging from the sysfs output for them.

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux Virtual SCSI HBAs and Virtual disks

2007-01-17 Thread Douglas Gilbert

Aboo Valappil wrote:
 Hi All,
 
 Thanks everyone to have a look at this.
 
 I think i modified to have the latest kernel support. Unfortunately I
 could not test it with 2.6.20 kernel due to some issues in my laptop and
 2.6.20 kernel. But it should work with 2.6.20 with this modification.
 
 The modified version is available through
 http://vscsihba.aboo.org/vscsihbav202.tgz.
 
 1. I fixed the kmem_cache issue for sure.
 2. I think i got around with INIT_WORK ... Made the following
 modifications ...

Perhaps you could get some of my scsi tools (e.g.
sdparm and sg3_utils) and make sure that vscsihba
can handle everything they can throw at it.
If the user space doesn't support a SCSI command then
your driver should fail gracefully (i.e. CHECK CONDITION,
etc).

Here is a worrying example: sdparm sends an INQUIRY
and a couple of MODE SENSE(10) commands to a device.
/dev/sda was created by your script:
$ ./start_target.sh id=3 -files zz_lun0

$ sdparm /dev/sda
/dev/sda: VirtualH  VHD   0
long wait
$


However dmesg showed this:

vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
sd 0:0:0:0: SCSI error: return code = 0x0002
end_request: I/O error, dev sda, sector 10240
Buffer I/O error on device sda, logical block 10240
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
sd 0:0:0:0: SCSI error: return code = 0x0002
end_request: I/O error, dev sda, sector 10240
Buffer I/O error on device sda, logical block 10240
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
sd 0:0:0:0: SCSI error: return code = 0x0002
end_request: I/O error, dev sda, sector 10240
Buffer I/O error on device sda, logical block 10240
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
sd 0:0:0:0: SCSI error: return code = 0x0002
end_request: I/O error, dev sda, sector 10240
Buffer I/O error on device sda, logical block 10240
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
sd 0:0:0:0: SCSI error: return code = 0x0002
end_request: I/O error, dev sda, sector 10240
Buffer I/O error on device sda, logical block 10240
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
vscsihba:3: In Reset Device
sd 0:0:0:0: SCSI error: return code = 0x0002
end_request: I/O error, dev sda, sector 10240
Buffer I/O error on device sda, logical block 10240
BUG: at kernel/sched.c:3388 sub_preempt_count()
 [e1bf029c] scsitap_eh_abort+0x1c/0x90 [vscsihba]
 [c024fe22] scsi_error_handler+0x3e2/0xbe0
 [c02d74f1] __sched_text_start+0x2f1/0x660
 [c024fa40] scsi_error_handler+0x0/0xbe0
 [c0131679] kthread+0xa9/0xe0
 [c01315d0] kthread+0x0/0xe0
 [c0103d0f] kernel_thread_helper+0x7/0x18
 ===
vscsihba:3: Abortng command serial number : 94
BUG: scheduling while atomic: scsi_eh_0/0x0001/4749
 [c02d7684] __sched_text_start+0x484/0x660
 [c013183b] autoremove_wake_function+0x1b/0x50
 [c01264a8] lock_timer_base+0x28/0x70
 [c01265f2] __mod_timer+0x92/0xd0
 [c02d826b] schedule_timeout+0x4b/0xd0
 [c01269c0] process_timeout+0x0/0x10
 [c02d7bbc] wait_for_completion_timeout+0x9c/0x130
 [c0119ee0] default_wake_function+0x0/0x10
 [c024f3c9] scsi_send_eh_cmnd+0x1b9/0x390
 [c011df3e] vprintk+0x1fe/0x3a0
 [c024f805] scsi_delete_timer+0x15/0x60
 [c024f624] scsi_eh_tur+0x34/0xa0
 [c024fe69] scsi_error_handler+0x429/0xbe0
 [c02d74f1] __sched_text_start+0x2f1/0x660
 [c024fa40] scsi_error_handler+0x0/0xbe0
 [c0131679] kthread+0xa9/0xe0
 [c01315d0] kthread+0x0/0xe0
 [c0103d0f] kernel_thread_helper+0x7/0x18
 ===
vscsihba:3: Abortng command serial number : 95
vscsihba:3: In Reset Device
BUG: scheduling while atomic: scsi_eh_0/0x0001/4749
 [c02d7684] __sched_text_start+0x484/0x660
 [c011df3e] vprintk+0x1fe/0x3a0
 [c01264a8] lock_timer_base+0x28/0x70
 [c01265f2] __mod_timer+0x92/0xd0
 [c02d826b] schedule_timeout+0x4b/0xd0
 [c01269c0] process_timeout+0x0/0x10
 [c02d7bbc] wait_for_completion_timeout+0x9c/0x130
 [c0119ee0] default_wake_function+0x0/0x10
 [c024f3c9] scsi_send_eh_cmnd+0x1b9/0x390
 [c024f805] scsi_delete_timer+0x15/0x60
 [c024f624] scsi_eh_tur+0x34/0xa0
 [e1bf00cd] scsitap_eh_device_reset+0x1d/0x30 [vscsihba]
 [c02503a8] scsi_error_handler+0x968/0xbe0
 [c02d74f1] __sched_text_start+0x2f1/0x660
 [c024fa40] scsi_error_handler+0x0/0xbe0
 [c0131679] kthread+0xa9/0xe0
 [c01315d0] kthread+0x0/0xe0
 [c0103d0f] kernel_thread_helper+0x7/0x18

Re: Linux Virtual SCSI HBAs and Virtual disks

2007-01-16 Thread Douglas Gilbert

Aboo Valappil wrote:
 Hi All,
 
 I have tried this before but I guess I was unsuccessful in presenting it
 properly in the mailing list. I think it is really useful especially for
 prototyping and also for people who wants to develop their own scsi
 targets and transports.
 There are few people told me about the SCSI target and initiator
 implementation of XEN. But I do not think it is this simple and might
 take a while to port it to normal linux kernel.  At the moment, there is
 nothing like this available in a simplest form.
 
 Please visit this site http://vscsihba.aboo.org. I put a complete
 description of the project and the source code. I appreciate if you
 could go through it and put your thoughts  This is my final attempt
 in this mailing list before I throw away whole of my work.

Throwing it away sounds a bit drastic. It tooks me a
while finding the tarball on your site. Perhaps you
could put it in a table under a Downloads section.
The table would be for different versions as it looks
like you may need a new one for bleeding edge kernels.

I didn't get far trying to build the kernel module
against lk 2.6.20-rc5:

# make
make -C /lib/modules/2.6.20-rc5/build 
M=/home/upgrades/apps/vscsihba1/vscsihba1/kernel modules
make[1]: Entering directory `/usr/src/linux-2.6.19'
  CC [M]  /home/upgrades/apps/vscsihba1/vscsihba1/kernel/hba.o
/home/upgrades/apps/vscsihba1/vscsihba1/kernel/hba.c:26: warning: 
‘kmem_cache_t’ is deprecated
  CC [M]  /home/upgrades/apps/vscsihba1/vscsihba1/kernel/device.o
/home/upgrades/apps/vscsihba1/vscsihba1/kernel/device.c:263:51: error: macro 
INIT_WORK passed 3 arguments, but takes just 2
/home/upgrades/apps/vscsihba1/vscsihba1/kernel/device.c: In function 
‘scsitap_ctl_ioctl’:
/home/upgrades/apps/vscsihba1/vscsihba1/kernel/device.c:263: error: ‘INIT_WORK’ 
undeclared (first use in this function)
/home/upgrades/apps/vscsihba1/vscsihba1/kernel/device.c:263: error: (Each 
undeclared identifier is reported only once
/home/upgrades/apps/vscsihba1/vscsihba1/kernel/device.c:263: error: for each 
function it appears in.)
make[2]: *** [/home/upgrades/apps/vscsihba1/vscsihba1/kernel/device.o] Error 1
make[1]: *** [_module_/home/upgrades/apps/vscsihba1/vscsihba1/kernel] Error 2
make[1]: Leaving directory `/usr/src/linux-2.6.19'
make: *** [modules] Error 2

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ieee1394: sbp2: remove bogus emulated host flag

2007-01-15 Thread Douglas Gilbert

Kristian Høgsberg wrote:
 On 1/14/07, Stefan Richter [EMAIL PROTECTED] wrote:
 There is no emulation going on here.
 ...
 
 -   .emulated= 1,
 
 Not sure what this flag does, but I copied it over to fw-sbp2.c.  If
 it's bogus, I guess we should drop it from fw-sbp2.c too.

Kristian,
The 'emulated' flag dates from the original ide-scsi
driver (lk 2.0 series or earlier) when some app wanted
to know if there was a real SCSI cd drive attached
or a fudged one (i.e. ATAPI) via the ide-scsi bridging
driver. So it is unclear to me why the sbp driver
(and USB mass storage) sets emulated.

Hopefully if any app cares these days there are much
better ways to find out what the transport is. Also
now we have the transport the LLD can see
and the transport the device (i.e. logical unit) can
see; and they aren't necessarily the same. In the
case of a CD/DVD drive there is the GET CONFIGURATION
command for finding out what the lu can see:
$ sg_get_config /dev/hdc
  HL-DT-ST  RW/DVD GCC-4242N  0201
  Peripheral device type: cd/dvd
No current profile
Features:
  Profile list feature
version=0, persist=1, current=1 [0x0]
available profiles [more recent typically higher in list]:
  profile: DVD-ROM , currentP=0
  profile: CD-ROM , currentP=0
  profile: CD-R , currentP=0
  profile: CD-RW , currentP=0
  Core feature
version=0, persist=1, current=1 [0x1]
  Physical interface standard: ATAPI
..

So IMO 'emulated' is best retired.

Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Adaptect 9405w: What is the best solution?

2007-01-11 Thread Douglas Gilbert

Tarjei Huse wrote:
 Hi, I'm working on getting Linux to use my SATA drives on an IBM x306
 running a HostRaid controller that uses the adaptech 9405w chipset.
 
 I found this thread on the list:
 http://thread.gmane.org/gmane.linux.scsi/29040/focus=29040

Basically Luben Tuikov developed the original aic94xx
SAS driver for Linux when he was working for Adaptec.
Soon after he made it available, the Linux SCSI
community decided to fork the development for various
reasons. The Linux SCSI community version seems to
have passed through various hands, currently it seems
to Darrick Wong's headache. In the meantime Mr Tuikov
has continued developing his version which I can
report is now very stable and feature rich on my hardware
(an adaptec 48300 HBA with a aic-9410w chip in it).

Some people involved are still quite upset about
what happened and it shows in the email exchange
you referred to. One unfortunate aspect of the GPL
and what the community did is that the source
code still has the copyright notices of Adaptec
and Luben Tuikov. That may lead an observer to think
that either or both still have an interest or
control over that driver. As far as I can see that
is not the case and a note should be added to the
source code to that end.

So given the above, Linux distribution vendors have
a problem when they try to certify hardware containing
Adaptec SAS aic94xx series chips.

 What I'm wondering about is:
 a) Where can I get the patches that mr Tuikov maintains for this
 chipset?

You need to contact Luben Tuikov  [EMAIL PROTECTED]  directly.

 b) Are they maintained wrt to different kernel versions, i.e.
 do they apply cleanly to a 2.6.20 or 2.6.16 kernel?

You need different driver versions for different kernels.

 c) In the thread Darrick Wong refers to another branch[1] that contains 
 (according to him) a lot of fixes for this chipset. Is that branch
 confirmed to work with my chipset?

I haven't checked recently but Adaptec used to have their own
linux driver, named the adp94xx driver. It would seem that
Adaptec has lost interest in Linux.

 My main goal is to get this box up and running using debian or ubuntu. I
 have managed to get it to run the debian 2.6.9 kernel as outlined in [2]
 using the adp94xx driver, but I cannot get that driver to compile on
 newer kernels.

See my previous note.

 I have also tried to compile the latest rc of the 2.6.20 kernel and
 loaded up the aic94xx driver + firmware. I then ended up getting the
 same errors that started the above mentioned thread on this list.

Here are my thoughts on the 48300 HBA and the available drivers.
I bought the device about 12 months ago so I assumed it at least
had production firmware on it. It required two firmware upgrades
before it was stable in POST+scan in non-trivial SAS toplologies
(by which I mean connected to SAS expanders). [Since I also had
a LSI MPT Fusion SAS controller, it was relatively simple for me
to determine the source of my problems.] All drivers I tried
(although I could never get the adp94xx driver working because
it was too old) showed the same problems (tmf timeouts). Then
Luben Tuikov sent me a version of his driver with a use_msi
related fix in it. My 48300 has been rock solid since. The last
time I tried Darrick's driver (about a week ago) it failed in
the fashion unto which I have become accustomed.

I have encouraged people to talk amongst themselves about the
use_msi patch, but I don't believe that I should be reverse
engineering that patch. There are other issues. As you may
understand from the above, I am walking on a bit of a tight rope
here.

I can also report that the same hardware works fine in
Windows 2000, Vista RC1 and that FreeBSD doesn't have
an aic94xx driver yet.

 So what is the best solution here? Has anyone managed to get a newer
 kernel to run on this chipset?
 
 1.
 http://www.kernel.org/git/?p=linux/kernel/git/jejb/aic94xx-sas-2.6.git;a=summary
 
 2. http://www.jimmy.co.at/weblog/?p=71

This last thread may help explain IBM's interest in getting the
aic94xx working reliably.


Is there a best solution?

Doug Gilbert



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: no utility / method to show association between host bus adapter and non-sg BLOCK devices

2007-01-11 Thread Douglas Gilbert

Thayne Harmon wrote:
 Gentlemen,
 
 hwinfo, lshal, sysfs do not show the relationship for non-sg BLOCK devices 
 with there 
 associated Host Bus Adapter.

All devices (i.e. logical units) have a 4 element tuple
associated with them and the first element is the host
number. A HBA contains one or more hosts. Then you can
datamine in /sys/class/scsi_host/hostn for whatever
information you want.

 Do you know of a utility or method that can show this?

May I suggest lsscsi. That won't help you in the lk 2.4
series and earlier though There are other methods by
which the sg device corresponding to a non-sg block
device (e.g. /dev/sdc) can be found.

 Example is the HP/Compaq CCISS block driver.
 
 The HBA and devices are listed, but no association is given or can be 
 determine, 
 only by the user knowing which is which.
 
 The kernel certainly knows, surely the above apps could be made to 
 determine this or some utility exits that will show this?

See http://www.torque.net/scsi/lsscsi.html

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: IO transfer limits

2007-01-11 Thread Douglas Gilbert

Stefan Richter wrote:
 Douglas Gilbert wrote:
 john clyne wrote:
 What do the different hostX in /sys/class/scsi_host corespond to? There are
 seven hostX directories, 5 with sg_tablesize set to 128 and two set to 255.
 A Linux host is a SCSI initiator port (e.g. FC) or a
 SCSI initiator device (e.g. SAS). Another way of looking
 at a host is as a bridge between a computer bus (e.g. PCI)
 and a storage transport. There is usually one (low level)
 driver (LLD) controlling all hosts associated with a
 specific class of hardware.

 If you fetch the lsscsi utility and load it then you can
 try 'lsscsi --hosts' to list the active hosts on a
 system (numbered on the left) and see the names of the
 various LLDs associated with them. Here is an example:

 # lsscsi --hosts
 [0]sata_nv
 [1]sata_nv
 [2]sata_nv
 [3]sata_nv
 [4]mptsas
 [5]aic94xx
 [6]sbp2

 The first four are SATA ports (connectors) on the motherboard,
 all controlled by the same driver. Then there is a LSI SAS
 HBA (whose driver is mptsas), an Adaptec SAS HBA (48300) and
 finally an Adaptec IEEE 1394 controller.
 
 A side note:
 I don't think a Scsi_Host has a well-defined meaning beyond the
 kernel-internal resource which LLDs use to connect to the Linux SCSI mid
 layer. It may have further meaning for many LLDs, but not for all.
 Specifically, the host6 in your example is in the current implementation
  indeed nothing more than an internal resource. lsscsi is nevertheless
 able to determine the actual initiator port by means of knowledge of the
 implementation.

There are two sides to a host: a kernel side (e.g. a
PCI device or virtual) and a storage transport side.
A host can be seen as a bridge between the two sides.

SCSI command sets need the concept of an initiator.
For queuing, mode page policy and reservations (i.e.
in multi-initiator environments) those initiators
(actually initiator ports) need domain unique identifiers,
preferably world wide unique. The identifier is an
attribute of the external side (i.e. the storage
transport side) of a linux host.

So even if you consider the kernel side of a host
is a kludge, there is still the storage transport
side to consider.

In the case of sbp, the initiator (device and port) has
a EUI-64 wwn. SBP, USB mass storage, and iSCSI all set up
SCSI hosts in Linux on a session basis (just-in-time if
you like). As long as the initiator port identifiers are
stable (predictable from one session to the next) it seems
to me little different to SAS, SPI and FC which maintain
their hosts for as long as their HBAs are present.


There are cheap external boxes out there that have 1394,
USB and (e)SATA interfaces. I wonder what would happen
if one tried to use two interfaces connected to different
machines at the same time :-)

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: IO transfer limits

2007-01-10 Thread Douglas Gilbert

john clyne wrote:
 Can anyone give me some guidance on where in the IO stack I might be running
 into a 512KB limit on IO transfer sizes to an external FC device? I've
 checked IO scheduler parameter
 (/sys/block/dev/queue/{max_sectors_kb,max_hw_sectors_kb}. Both are set to
 32767. I'm using Qlogic HBAs (qla2312), but I don't see any relevent
 parameters. I'm running RHEL 4.0 with a 2.6.9-34 kernel. Any pointers would
 be greatly appreciated.

John,
I discuss the subject in this page:
   http://www.torque.net/sg/sg_io.html
in the section titled:
Maximum transfer size per command

Mike C. has given you the answer for the block device
interface (e.g. via /dev/sda); you should be able
to do about 8 times better via the scsi generic
interface (e.g. /dev/sg0).

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: IO transfer limits

2007-01-10 Thread Douglas Gilbert

john clyne wrote:
 What do the different hostX in /sys/class/scsi_host corespond to? There are
 seven hostX directories, 5 with sg_tablesize set to 128 and two set to 255.

A Linux host is a SCSI initiator port (e.g. FC) or a
SCSI initiator device (e.g. SAS). Another way of looking
at a host is as a bridge between a computer bus (e.g. PCI)
and a storage transport. There is usually one (low level)
driver (LLD) controlling all hosts associated with a
specific class of hardware.

If you fetch the lsscsi utility and load it then you can
try 'lsscsi --hosts' to list the active hosts on a
system (numbered on the left) and see the names of the
various LLDs associated with them. Here is an example:

# lsscsi --hosts
[0]sata_nv
[1]sata_nv
[2]sata_nv
[3]sata_nv
[4]mptsas
[5]aic94xx
[6]sbp2

The first four are SATA ports (connectors) on the motherboard,
all controlled by the same driver. Then there is a LSI SAS
HBA (whose driver is mptsas), an Adaptec SAS HBA (48300) and
finally an Adaptec IEEE 1394 controller.

 Is the implication that the hard limit is 255 * page_size, or is page_size
 simply the default?

There are big pages (around 1 MB in size) but I'm
unaware that anything in the SCSI subsystem uses them.
Otherwise the kernel page size is typically 4 KB.
When the scsi generic driver builds its scatter gather
lists, then it attempts to place 8 contiguous pages in
each scatter gather element.

Arm waving was a term used when I tried to explain
to several kernel people that there were users out there
that needed larger IO transfer limits. So I suggest that
you talk the the management and tell them why you
need higher limits. Linux is retarded in this area.

Doug Gilbert

 Mike Christie wrote:
 john clyne wrote:
 Can anyone give me some guidance on where in the IO stack I might be
 running
 into a 512KB limit on IO transfer sizes to an external FC device? I've
 checked IO scheduler parameter
 (/sys/block/dev/queue/{max_sectors_kb,max_hw_sectors_kb}. Both are set
 to
 32767. I'm using Qlogic HBAs (qla2312), but I don't see any relevent
 parameters. I'm running RHEL 4.0 with a 2.6.9-34 kernel. Any pointers
 would
 be greatly appreciated.

 There are also scatterlist limits.

 /sys/class/scsi_host/hostX/sg_tablesize is a limit for the number of
 scatter list entries. For qla2xxx it is 255.

 The scsi layer sets the queue's max_phys_segments to 128 by default. I
 thought there was ia scsi compile time option to increase this, but
 maybe you have to just modify the SCSI_MAX_PHYS_SEGMENTS define by hand.

 So with the default value and with 4 K pages if you end up getting pages
 that cannot be clustered you will end up with 4K * 128.
 -
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


 

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: lsscsi version 0.19 beta

2007-01-08 Thread Douglas Gilbert

Further to the announcement 1 month ago (shown below),
there is another lsscsi-0.19 beta at:
http://www.torque.net/sg [News section].
It provides target (and sometimes host) transport
information for:
  - IEEE1394 (sbp)
  - FC
  - ISCSI
  - SAS (two representations)
  - SPI

It has tested with lk 2.6.20-rc3 . Unfortunately fetching
information out of sysfs could become a maze of kernel
version dependencies as various maintainers change
representations. This beta was tested with the intriguingly
named SYSFS_DEPRECATED config option deselected. Sysfs
is not deprecated yet (sigh) but deselecting SYSFS_DEPRECATED
removes various symlinks which breaks earlier lsscsi
betas.

Thanks to the maintainers of various SCSI transports
for helping me find what information is available in
sysfs and testing code for me. They are named in the
CREDITS file.

Doug Gilbert


Douglas Gilbert wrote [2006/12/7]:
 The last announcement I made to this list about lsscsi
 was back in March and that was a beta for lsscsi version
 0.18 . The change proposed by James Bottomley that prompted
 the beta has not materialized. So I decided to release
 version 0.18 without fanfare a week ago and start adding
 transport information to lsscsi, dubbing it version 0.19
 beta. See http://www.torque.net/scsi/lsscsi.html for
 downloads.
 
 The mushrooming of information (and different representations)
 in /sys has made it possible for lsscsi to provide a lot
 more information than it has previously. Ironically what
 storage device identification really needs is not available,
 namely the logical unit _name_ which, for SCSI devices, is
 obtained from the device identification VPD page (0x83).
 As a consolation there is lots of transport information.
 
 So this beta adds transport information, both target
 and initiator (host) for these transports:
   - FC
   - SAS
 
 I hope to add iSCSI if I can find a way through its maze.
 Perhaps USB and 1394 are candidates as well, even SPI.
 In the case of SAS, both the SAS transport layer and the
 SAS class (i.e. Luben Tuikov's design) representations
 are supported.
 
 The new options are '--transport' (or '-t') and '--list'
 (or '-L').
 
 Here is an example where disk strings are insufficient:
 # lsscsi
 [4:0:0:0]diskATA  ST3160812AS  D /dev/sda
 [4:0:1:0]diskSEAGATE  ST336754SS   0003  /dev/sdb
 [4:0:2:0]diskSEAGATE  ST336754SS   0003  /dev/sdc
 [4:0:3:0]diskATA  ST380013AS   3.18  /dev/sdd
 [4:0:4:0]diskSEAGATE  ST336754SS   0003  /dev/sde
 [4:0:5:0]diskSEAGATE  ST336754SS   0003  /dev/sdf
 [5:0:0:0]diskSEAGATE  ST336754SS   0003  /dev/sdg
 [5:0:1:0]diskSEAGATE  ST336754SS   0003  /dev/sdh
 [5:1:0:0]diskSEAGATE  ST336754SS   0003  /dev/sdi
 [5:1:1:0]diskSEAGATE  ST336754SS   0003  /dev/sdj
 
 How many disks are there? Looking at the transport information:
 # lsscsi -t
 [4:0:0:0]disksas:0x0b1d2c035c7e5d4c  /dev/sda
 [4:0:1:0]disksas:0x5000c55208ed  /dev/sdb
 [4:0:2:0]disksas:0x5000c5520a29  /dev/sdc
 [4:0:3:0]disksas:0x500605b033e1  /dev/sdd
 [4:0:4:0]disksas:0x5000c55208ee  /dev/sde
 [4:0:5:0]disksas:0x5000c5520a2a  /dev/sdf
 [5:0:0:0]disksas:5000c55208ed/dev/sdg
 [5:0:1:0]disksas:5000c5520a29/dev/sdh
 [5:1:0:0]disksas:5000c55208ee/dev/sdi
 [5:1:1:0]disksas:5000c5520a2a/dev/sdj
 
 So everything is SAS attached, including two SATA disks.
 Something strange is happening with 4:0:0:0 which is
 directly attached to the host4. From the target SAS
 addresses it can be seen that /dev/sdc and /dev/sdh
 are the same port (and because the lun is 0 in both
 cases, it must be the same lu). There are three other
 pairs there, reducing what looks like 10 disks to
 six. The adjacent SAS addresses are dual ports on the
 same disk, so the actual number of disks is 4.
 Why are some SAS addresses prefixed with 0x and other
 not? lsscsi simply prints out what is in /sys !
 
 To fetch further information about the target that contains
 /dev/sdf using a filter to reduce clutter:
 # lsscsi --transport --list 4:0:5:0
 [4:0:5:0]disksas:0x5000c5520a2a  /dev/sdf
   transport=sas
   initiator_port_protocols=none
   initiator_response_timeout=0
   I_T_nexus_loss_timeout=1744
   phy_identifier=11
   ready_led_meaning=0
   sas_address=0x5000c5520a2a
   target_port_protocols=ssp
 
 A similar check on the target containing /dev/sdj
 # lsscsi -t -L 5:1:1
 [5:1:1:0]disksas:5000c5520a2a/dev/sdj
   transport=sas
   sub_transport=sas_class
   device_name=
   dev_type=end device
   iproto=
   iresp_timeout=0x
   linkrate=3,0 Gbps
   max_linkrate=3,0 Gbps
   max_pathways=1
   min_linkrate=3,0 Gbps
   pathways=1
   ready_led_meaning=0

Re: [PATCH] SCSI core: better initialization for sdev-scsi_level

2007-01-08 Thread Douglas Gilbert

Alan Stern wrote:
 Both scsi_device and scsi_target include a scsi_level field, and the
 SCSI core makes a half-hearted effort to keep the values equal.
 Ultimately this effort may be doomed, since as far as I know there is
 no reason why all LUNs in a target must report the same ANSI-approved
 version number.  But for the most part it should work okay.
 
 This patch (as834) changes the SCSI core so that after the first LUN
 has been probed and the target's scsi_level is known, further LUNs
 default to the target's scsi_level and not to SCSI_2.

Alan,
Umm, scsi_level is a mangled value derived from the
version field in an INQUIRY standard response. As
such it is per logical unit ***. There is nothing to stop
a single target (especially if it is a bridge that
maps targets at the remote end to luns) having a wide
variety of lus with different version values (and
different peripheral device types).

IMO scsi_level should only be per lu which means it
should only exist in the scsi_device structure.
If the scsi mid level was really advanced it could
track the version value in the INQUIRY response to
well known logical units (see spc4r08.pdf section 8)
because these really are per target. I don't expect
that to happen any time soon (and there wouldn't be
much benefit).

So the existing code seems broken but I'm not sure
your patch advances things.


*** this statement assumes the peripheral qualifier
field is 0, otherwise there is no real lu at the
given lun

Doug Gilbert


 Signed-off-by: Alan Stern [EMAIL PROTECTED]
 
 ---
 
 This patch will affect the CDB in INQUIRY commands sent to LUNs above 0 
 when LUN-0 reports a scsi_level of 0; the LUN bits will no longer be set 
 in the second byte of the CDB.  This is as it should be.  Nevertheless, 
 it's possible that some wacky device might be adversely affected.  I doubt 
 anyone will complain...
 
 Alan Stern
 
 
 Index: usb-2.6/drivers/scsi/scsi_scan.c
 ===
 --- usb-2.6.orig/drivers/scsi/scsi_scan.c
 +++ usb-2.6/drivers/scsi/scsi_scan.c
 @@ -382,6 +382,7 @@ static struct scsi_target *scsi_alloc_ta
   INIT_LIST_HEAD(starget-siblings);
   INIT_LIST_HEAD(starget-devices);
   starget-state = STARGET_RUNNING;
 + starget-scsi_level = SCSI_2;
   retry:
   spin_lock_irqsave(shost-host_lock, flags);
  
 Index: usb-2.6/drivers/scsi/scsi_sysfs.c
 ===
 --- usb-2.6.orig/drivers/scsi/scsi_sysfs.c
 +++ usb-2.6/drivers/scsi/scsi_sysfs.c
 @@ -922,7 +922,7 @@ void scsi_sysfs_device_initialize(struct
   snprintf(sdev-sdev_classdev.class_id, BUS_ID_SIZE,
%d:%d:%d:%d, sdev-host-host_no,
sdev-channel, sdev-id, sdev-lun);
 - sdev-scsi_level = SCSI_2;
 + sdev-scsi_level = starget-scsi_level;
   transport_setup_device(sdev-sdev_gendev);
   spin_lock_irqsave(shost-host_lock, flags);
   list_add_tail(sdev-same_target_siblings, starget-devices);
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PATCH] scsi bug fixes for 2.6.20-rc4

2007-01-07 Thread Douglas Gilbert

James Bottomley wrote:
 On Sun, 2007-01-07 at 11:16 -0700, Matthew Wilcox wrote:
 On Sun, Jan 07, 2007 at 10:04:03AM -0600, James Bottomley wrote:
 This is mainly bug fixes, although there are a few harmless updates
 (like email addresses and driver PCI IDs).  The patch is available here:
 Could I ask that you add

 http://marc.theaimsgroup.com/?l=linux-scsim=116793460427798w=2
 
 OK ... how about a title, changelog and signoff for it?

Titles and sign-offs don't necessarily help. For example:

http://marc.theaimsgroup.com/?l=linux-scsim=116797255528029w=2

Followed by this misdirected ** post from Eric Moore
which you and I also received:

On Thursday, January 04, 2007 9:39 PM, Douglas Gilbert wrote:

 
  This patch makes the mptctl pass through available if
  the mptsas driver is selected. Without this patch
  if mptsas is the only fusion driver chosen, then
  the mptctl is not presented as an option.
 
  smp_utils uses the mptctl driver to pass SAS SMP
  functions through a MPT SAS HBA.
 
  I have discussed this patch with Eric and it may
  be in one of his coming patchsets (but I didn't see
  it in today's patches). So this one is for the
  record. The patch is against lk 2.6.20-rc3 .
 
  Signed-off-by: Douglas Gilbert [EMAIL PROTECTED]
 

Sorry I overlooked this, but please apply.

ACK



** [EMAIL PROTECTED] strikes again.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] scsi_debug error processing

2007-01-04 Thread Douglas Gilbert

After discussions in the thread titled:
[PATCH] scsi_debug: illegal blocking memory allocation
here is a patch containing the discussed fix and some other
fixes and additions. The patch is against lk 2.6.20-rc3 .
The version is bumped to 1.81 .

ChangeLog:
  - Change several GFP_KERNEL allocations to GFP_ATOMIC
as they can be called from queuecommand() context
  - check above allocation returns and if out of memory
report DID_REQUEUE in two cases, DID_NO_CONNECT in
another, and fail slave configure() in another
  - add support for WRITE BUFFER command
  - add aborted_command error injection support
(opts mask 0x10), similar mechanism to
recovered_error injection.

Signed-off-by: Douglas Gilbert [EMAIL PROTECTED]

Doug Gilbert
--- linux/drivers/scsi/scsi_debug.c 2006-11-30 10:00:01.0 -0500
+++ linux/drivers/scsi/scsi_debug.c2620rc3abo1  2007-01-04 21:49:33.0 
-0500
@@ -51,10 +51,10 @@
 #include scsi_logging.h
 #include scsi_debug.h
 
-#define SCSI_DEBUG_VERSION 1.80
-static const char * scsi_debug_version_date = 20061018;
+#define SCSI_DEBUG_VERSION 1.81
+static const char * scsi_debug_version_date = 20070104;
 
-/* Additional Sense Code (ASC) used */
+/* Additional Sense Code (ASC) */
 #define NO_ADDITIONAL_SENSE 0x0
 #define LOGICAL_UNIT_NOT_READY 0x4
 #define UNRECOVERED_READ_ERR 0x11
@@ -65,9 +65,13 @@
 #define INVALID_FIELD_IN_PARAM_LIST 0x26
 #define POWERON_RESET 0x29
 #define SAVING_PARAMS_UNSUP 0x39
+#define TRANSPORT_PROBLEM 0x4b
 #define THRESHOLD_EXCEEDED 0x5d
 #define LOW_POWER_COND_ON 0x5e
 
+/* Additional Sense Code Qualifier (ASCQ) */
+#define ACK_NAK_TO 0x3
+
 #define SDEBUG_TAGGED_QUEUING 0 /* 0 | MSG_SIMPLE_TAG | MSG_ORDERED_TAG */
 
 /* Default values for driver parameters */
@@ -95,15 +99,20 @@
 #define SCSI_DEBUG_OPT_MEDIUM_ERR   2
 #define SCSI_DEBUG_OPT_TIMEOUT   4
 #define SCSI_DEBUG_OPT_RECOVERED_ERR   8
+#define SCSI_DEBUG_OPT_TRANSPORT_ERR   16
 /* When every_nth  0 then modulo every_nth commands:
  *   - a no response is simulated if SCSI_DEBUG_OPT_TIMEOUT is set
  *   - a RECOVERED_ERROR is simulated on successful read and write
  * commands if SCSI_DEBUG_OPT_RECOVERED_ERR is set.
+ *   - a TRANSPORT_ERROR is simulated on successful read and write
+ * commands if SCSI_DEBUG_OPT_TRANSPORT_ERR is set.
  *
  * When every_nth  0 then after - every_nth commands:
  *   - a no response is simulated if SCSI_DEBUG_OPT_TIMEOUT is set
  *   - a RECOVERED_ERROR is simulated on successful read and write
  * commands if SCSI_DEBUG_OPT_RECOVERED_ERR is set.
+ *   - a TRANSPORT_ERROR is simulated on successful read and write
+ * commands if SCSI_DEBUG_OPT_TRANSPORT_ERR is set.
  * This will continue until some other action occurs (e.g. the user
  * writing a new value (other than -1 or 1) to every_nth via sysfs).
  */
@@ -315,6 +324,7 @@
int target = SCpnt-device-id;
struct sdebug_dev_info * devip = NULL;
int inj_recovered = 0;
+   int inj_transport = 0;
int delay_override = 0;
 
if (done == NULL)
@@ -352,6 +362,8 @@
return 0; /* ignore command causing timeout */
else if (SCSI_DEBUG_OPT_RECOVERED_ERR  scsi_debug_opts)
inj_recovered = 1; /* to reads and writes below */
+   else if (SCSI_DEBUG_OPT_TRANSPORT_ERR  scsi_debug_opts)
+   inj_transport = 1; /* to reads and writes below */
 }
 
if (devip-wlun) {
@@ -468,7 +480,11 @@
mk_sense_buffer(devip, RECOVERED_ERROR,
THRESHOLD_EXCEEDED, 0);
errsts = check_condition_result;
-   }
+   } else if (inj_transport  (0 == errsts)) {
+mk_sense_buffer(devip, ABORTED_COMMAND,
+TRANSPORT_PROBLEM, ACK_NAK_TO);
+errsts = check_condition_result;
+}
break;
case REPORT_LUNS:   /* mandatory, ignore unit attention */
delay_override = 1;
@@ -531,6 +547,9 @@
delay_override = 1;
errsts = check_readiness(SCpnt, 0, devip);
break;
+   case WRITE_BUFFER:
+   errsts = check_readiness(SCpnt, 1, devip);
+   break;
default:
if (SCSI_DEBUG_OPT_NOISE  scsi_debug_opts)
printk(KERN_INFO scsi_debug: Opcode: 0x%x not 
@@ -954,7 +973,9 @@
int alloc_len, n, ret;
 
alloc_len = (cmd[3]  8) + cmd[4];
-   arr = kzalloc(SDEBUG_MAX_INQ_ARR_SZ, GFP_KERNEL);
+   arr = kzalloc(SDEBUG_MAX_INQ_ARR_SZ, GFP_ATOMIC);
+   if (! arr)
+   return DID_REQUEUE  16;
if (devip-wlun)
pq_pdt = 0x1e;  /* present, wlun */
else if (scsi_debug_no_lun_0  (0 == devip-lun))
@@ -1217,7 +1238,9 @@
alen = ((cmd[6]  24) + (cmd[7]  16) + (cmd[8]  8

Re: [PATCH] scsi_debug: illegal blocking memory allocation

2007-01-04 Thread Douglas Gilbert

Jens Axboe wrote:
 On Thu, Jan 04 2007, James Bottomley wrote:
 On Thu, 2007-01-04 at 12:21 +0100, Jens Axboe wrote:
 I guess it's fully up to you how you want to solve it. The scheme seems
 a little elaborate, but these error conditions are unlikely to ever been
 seen in the wild, so no objections from me.
 Actually, there's already a DID_ code that does what you want.  Instead
 of DID_ERROR, which will retry immediately, there's DID_REQUEUE which
 will halt the device queue and wait for a returning command to retry.
 
 As long as it keeps firing the queue at some intervals even without any
 commands pending at all, then that'll work just fine. I like that
 approach a lot better than coding the error into some sense value that
 is (at best) some vague approximation of what has happened (calling
 memory shortage a transport error is a bit of a stretch).

True, but both happen. The scsi_debug driver is a
virtual host, virtual target and a lu (ram disk).
The failure that you pointed out stopped a response
being built. In the real world that would in the
target or lu.

The reason that I mentioned aborted_command sense key
is that it is also a out of resources (bandwidth) error
and it broke sg_dd. I would bet money that it would
also break the block layer/sd. The block layer should
leave it alone as it is simply a matter of sd
retrying a few times. However the st driver could have
a real problem (e.g. did that state changing command
work, fail or partially work??).

Anyway, I have submitted a patch that reports DID_REQUEUE
for an allocation failures and adds the ability to inject
aborted_command errors.

Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] scsi_debug: illegal blocking memory allocation

2007-01-03 Thread Douglas Gilbert

Jens Axboe wrote:
 Hi Doug,
 
 resp_inquiry() does a GFP_KERNEL memory allocation, but it's not allowed
 to from the queuecommand context. There's no good way to return this
 error, so I used DID_ERROR which is used from similar paths. That
 doesn't seem quite right though, it would be better to return an error
 indicating a later retry would be more appropriate.
 
 Signed-off-by: Jens Axboe [EMAIL PROTECTED]

 diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
 index 30ee3d7..0c80ed3 100644
 --- a/drivers/scsi/scsi_debug.c
 +++ b/drivers/scsi/scsi_debug.c
 @@ -954,7 +954,9 @@ static int resp_inquiry(struct scsi_cmnd * scp, int 
 target,
   int alloc_len, n, ret;
  
   alloc_len = (cmd[3]  8) + cmd[4];
 - arr = kzalloc(SDEBUG_MAX_INQ_ARR_SZ, GFP_KERNEL);
 + arr = kzalloc(SDEBUG_MAX_INQ_ARR_SZ, GFP_ATOMIC);
 + if (!arr)
 + return DID_ERROR  16;
   if (devip-wlun)
   pq_pdt = 0x1e;  /* present, wlun */
   else if (scsi_debug_no_lun_0  (0 == devip-lun))
 

Jens,
I had to read that twice. I'm always happy to convert a
GFP_KERNEL to a GFP_ATOMIC (as I'm sure it started as a
GFP_ATOMIC). There are a couple more that may be in
queuecommand context.

Taking up your point about retries and seeing that the
scsi_debug driver has a SAS flavour, I'm inclined towards
a aborted command, initiator response timeout [Bh,4Bh,6]
CHECK CONDITION. There is a group of transport injected
error messages in SAS (see sas2r07.pdf section 10.2.3)
that pop up from time to time. Due to conjestion in
connection-switched SAS expanders these error messages
should be interpreted as try again depending on the
topology. The patch below adds a aborted_command bit
in opts that will cause every nth command to be aborted
(with an ack/nak timeout).

Note that SAS has an optional transport layer retries
state machine to lessen the incidence of aborted commands.
Evidently SAS tape drives use the facility.

Doug Gilbert
--- linux/drivers/scsi/scsi_debug.c 2006-11-30 10:00:01.0 -0500
+++ linux/drivers/scsi/scsi_debug.c2620rc3abo   2007-01-04 00:19:49.0 
-0500
@@ -51,10 +51,10 @@
 #include scsi_logging.h
 #include scsi_debug.h
 
-#define SCSI_DEBUG_VERSION 1.80
-static const char * scsi_debug_version_date = 20061018;
+#define SCSI_DEBUG_VERSION 1.81
+static const char * scsi_debug_version_date = 20070104;
 
-/* Additional Sense Code (ASC) used */
+/* Additional Sense Code (ASC) */
 #define NO_ADDITIONAL_SENSE 0x0
 #define LOGICAL_UNIT_NOT_READY 0x4
 #define UNRECOVERED_READ_ERR 0x11
@@ -65,9 +65,14 @@
 #define INVALID_FIELD_IN_PARAM_LIST 0x26
 #define POWERON_RESET 0x29
 #define SAVING_PARAMS_UNSUP 0x39
+#define TRANSPORT_PROBLEM 0x4b
 #define THRESHOLD_EXCEEDED 0x5d
 #define LOW_POWER_COND_ON 0x5e
 
+/* Additional Sense Code Qualifier (ASCQ) */
+#define ACK_NAK_TO 0x3
+#define INIT_RESPONSE_TO 0x6
+
 #define SDEBUG_TAGGED_QUEUING 0 /* 0 | MSG_SIMPLE_TAG | MSG_ORDERED_TAG */
 
 /* Default values for driver parameters */
@@ -95,15 +100,20 @@
 #define SCSI_DEBUG_OPT_MEDIUM_ERR   2
 #define SCSI_DEBUG_OPT_TIMEOUT   4
 #define SCSI_DEBUG_OPT_RECOVERED_ERR   8
+#define SCSI_DEBUG_OPT_TRANSPORT_ERR   16
 /* When every_nth  0 then modulo every_nth commands:
  *   - a no response is simulated if SCSI_DEBUG_OPT_TIMEOUT is set
  *   - a RECOVERED_ERROR is simulated on successful read and write
  * commands if SCSI_DEBUG_OPT_RECOVERED_ERR is set.
+ *   - a TRANSPORT_ERROR is simulated on successful read and write
+ * commands if SCSI_DEBUG_OPT_TRANSPORT_ERR is set.
  *
  * When every_nth  0 then after - every_nth commands:
  *   - a no response is simulated if SCSI_DEBUG_OPT_TIMEOUT is set
  *   - a RECOVERED_ERROR is simulated on successful read and write
  * commands if SCSI_DEBUG_OPT_RECOVERED_ERR is set.
+ *   - a TRANSPORT_ERROR is simulated on successful read and write
+ * commands if SCSI_DEBUG_OPT_TRANSPORT_ERR is set.
  * This will continue until some other action occurs (e.g. the user
  * writing a new value (other than -1 or 1) to every_nth via sysfs).
  */
@@ -315,6 +325,7 @@
int target = SCpnt-device-id;
struct sdebug_dev_info * devip = NULL;
int inj_recovered = 0;
+   int inj_transport = 0;
int delay_override = 0;
 
if (done == NULL)
@@ -352,6 +363,8 @@
return 0; /* ignore command causing timeout */
else if (SCSI_DEBUG_OPT_RECOVERED_ERR  scsi_debug_opts)
inj_recovered = 1; /* to reads and writes below */
+   else if (SCSI_DEBUG_OPT_TRANSPORT_ERR  scsi_debug_opts)
+   inj_transport = 1; /* to reads and writes below */
 }
 
if (devip-wlun) {
@@ -468,7 +481,11 @@
mk_sense_buffer(devip, RECOVERED_ERROR,
THRESHOLD_EXCEEDED, 0);
errsts = check_condition_result;
-   }
+   } else if

Re: [Patch] scsi: megaraid_{mm,mbox}: init fix for kdump

2006-12-30 Thread Douglas Gilbert

Randy Dunlap wrote:
 On Fri, 29 Dec 2006 08:02:17 -0800 Sumant Patro wrote:
 
 See Documentation/SubmittingPatches:
 Please include output of diffstat -p1 -w70 so that we can easily see
 the scope of the changes.
 
 and see Documentation/CodingStyle for comments below:
 
 
 diff -uprN linux-2.6.orig/drivers/scsi/megaraid/megaraid_mbox.c 
 linux-2.6.new/drivers/scsi/megaraid/megaraid_mbox.c
 --- linux-2.6.orig/drivers/scsi/megaraid/megaraid_mbox.c 2006-12-28 
 09:56:04.0 -0800
 +++ linux-2.6.new/drivers/scsi/megaraid/megaraid_mbox.c 2006-12-29 
 05:31:48.0 -0800
 @@ -779,6 +780,22 @@ megaraid_init_mbox(adapter_t *adapter)
  goto out_release_regions;
  }
  
 +// initialize the mutual exclusion lock for the mailbox
 +spin_lock_init(raid_dev-mailbox_lock);
 
 Linux uses /*...*/ C89-style comments, not // C99 comments.

Randy
It is about time this absurd stipulation was dropped.
Are there any C compilers that can compile the linux
kernel and that don't accept both _standard_ comment styles?

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: sas_device/end_device-*/phy_identifier flipped

2006-12-30 Thread Douglas Gilbert

Luben Tuikov wrote:
 --- Douglas Gilbert [EMAIL PROTECTED] wrote:
 In lk 2.6.20-rc2 (and probably earlier) the phy_identifier
 attribute in the /sys/class/sas_device/end_device-*
 directory is showing the wrong end of the point to point
 link.

 Phy identifiers on (dual ported) SAS disks are typically
 0 and 1. For SATA disks the phy identifier should be 0.

 # lsscsi
 [4:0:0:0]diskATA  ST3160812AS  D /dev/sda
 [4:0:1:0]diskSEAGATE  ST336754SS   0003  /dev/sdb
 # lsscsi -t
 [4:0:0:0]disksas:0x500605b033e6  /dev/sda
 [4:0:1:0]disksas:0x5000c55208ee  /dev/sdb
 # lsscsi -tL 4:0:1:0
 [4:0:1:0]disksas:0x5000c55208ee  /dev/sdb
   transport=sas
   initiator_port_protocols=none
   initiator_response_timeout=1
   I_T_nexus_loss_timeout=1744
   phy_identifier=7
   ready_led_meaning=1
   sas_address=0x5000c55208ee
   target_port_protocols=ssp

 # smp_discover -mb
 Device 500605b033ef, expander (only connected phys shown):
   phy   5:T:attached:[500605b6f260:03  i(SSP+STP+SMP)]  3 Gbps
   phy   6:T:attached:[500605b033e6:00  t(SATA)]  1.5 Gbps
   phy   7:T:attached:[5000c55208ee:01  t(SSP)]  3 Gbps


 The SATA and SAS disks are connected via an expander which
 lets me look at sysfs for 4:0:1:0 and the expander configuration
 with smp_discover. The port in use on the SAS disk has the
 address: 5000c55208ee . The expander says that cable is
 attached to phy 1 which agrees with what I can see. However
 sysfs reports phy_identifier=7 which is wrong (and happens
 to be the attached phy_id seen from the SAS disk).

 Both aic94xx and mptsas drivers do the same thing so it
 looks like a SAS transport problem.
 
 Have you tested this with the SAS Stack as I distribute it?

Luben,
Yes, but it is boring because it just works ***.

With your driver for a different port on the same SAS
disk, lsscsi outputs:

# lsscsi -tL 6:0:0:0
[6:0:0:0]disksas:5000c55208ed/dev/sdd
  transport=sas
  sub_transport=sas_class
  device_name=
  dev_type=end device
  iproto=
  iresp_timeout=0x2710
  linkrate=3,0 Gbps
  max_linkrate=3,0 Gbps
  max_pathways=1
  min_linkrate=3,0 Gbps
  pathways=1
  ready_led_meaning=1
  rl_wlun=0
  sas_addr=5000c55208ed
  tproto=SSP
  transport_layer_retries=0

lsscsi is data mining this directory:
/sys/class/scsi_device/6:0:0:0/device/sas_device

which contains:
# ls
device_nameitnl_timeout  max_pathways   rl_wlun
dev_type   linkrate  min_linkrate   sas_addr
iproto LUNS  pathways   tproto
iresp_timeout  max_linkrate  ready_led_meaning  transport_layer_retries

Interestingly there is no phy_id entry (and a single
entry wouldn't be sufficient if the target was
wide port). I can live without the phy_id there (as
it can be found other ways: SMP and the protocol
specific (SAS) log page).

So the bottom line is that the phy_id(s) doesn't need
to be there but if it is it should be correct.


*** I plan to write another mail on the aic94xx
driver mess.

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ieee1394: sbp2: pass REQUEST_SENSE through to the target

2006-12-28 Thread Douglas Gilbert

Stefan Richter wrote:
 Delete some incorrect code, left over from the initial driver submission
 in March 2001.
 
 SBP-2 targets should provide sense data via the SBP-2 status block
 (autosense).  We have to pass the REQUEST_SENSE command through to
 targets which don't implement autosense, if there are any.

Umm, REQUEST SENSE has several other useful capabilities.
It can convey information about low power conditions,
a progress indicator (e.g. during FORMAT with IMMED=1)
and informational exception warnings. It is also
defined to work any time INQUIRY works (e.g. on lun=0
when there is no lu there but there is a lun0).

smartmontools sets MRIE to 6 in the control mode page so
it can periodically poll a disk with REQUEST SENSE to see
if it has tripped a threshold . It could use other techniques
but they would most likely interfere with normal error
processing on the host OS (and linux is one of about 8).

So, this patch is a step in the right direction.
Hopefully not too many other LLDs are playing
games with REQUEST SENSE.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KERN_NOTICE very big device. try to use READ CAPACITY(16)

2006-12-27 Thread Douglas Gilbert

This message is generated by sd when a disk is larger
than 2 TB. Does it need to be? Could it be a logging
message?

It is also badly worded as the imperative try suggests
that the user needs to find a utility that sends a READ
CAPACITY(16). And I was recently contacted by a user with
that in mind.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

sas_device/end_device-*/phy_identifier flipped

2006-12-27 Thread Douglas Gilbert

In lk 2.6.20-rc2 (and probably earlier) the phy_identifier
attribute in the /sys/class/sas_device/end_device-*
directory is showing the wrong end of the point to point
link.

Phy identifiers on (dual ported) SAS disks are typically
0 and 1. For SATA disks the phy identifier should be 0.

# lsscsi
[4:0:0:0]diskATA  ST3160812AS  D /dev/sda
[4:0:1:0]diskSEAGATE  ST336754SS   0003  /dev/sdb
# lsscsi -t
[4:0:0:0]disksas:0x500605b033e6  /dev/sda
[4:0:1:0]disksas:0x5000c55208ee  /dev/sdb
# lsscsi -tL 4:0:1:0
[4:0:1:0]disksas:0x5000c55208ee  /dev/sdb
  transport=sas
  initiator_port_protocols=none
  initiator_response_timeout=1
  I_T_nexus_loss_timeout=1744
  phy_identifier=7
  ready_led_meaning=1
  sas_address=0x5000c55208ee
  target_port_protocols=ssp

# smp_discover -mb
Device 500605b033ef, expander (only connected phys shown):
  phy   5:T:attached:[500605b6f260:03  i(SSP+STP+SMP)]  3 Gbps
  phy   6:T:attached:[500605b033e6:00  t(SATA)]  1.5 Gbps
  phy   7:T:attached:[5000c55208ee:01  t(SSP)]  3 Gbps


The SATA and SAS disks are connected via an expander which
lets me look at sysfs for 4:0:1:0 and the expander configuration
with smp_discover. The port in use on the SAS disk has the
address: 5000c55208ee . The expander says that cable is
attached to phy 1 which agrees with what I can see. However
sysfs reports phy_identifier=7 which is wrong (and happens
to be the attached phy_id seen from the SAS disk).

Both aic94xx and mptsas drivers do the same thing so it
looks like a SAS transport problem.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Announce] smp_utils-0.92 available

2006-12-09 Thread Douglas Gilbert

smp_utils is a package of command line utilities for invoking
SMP functions to monitor and manage SAS expanders. SMP is the
Serial Attached SCSI (SAS) Management Protocol (SMP). A SAS Host
Bus Adapter (HBA) includes a SMP initiator (along with a SSP and
STP initiator). A SAS expander contains a SMP target. Several
SAS HBAs have a SMP pass through interface that can be used to
send SMP requests and receive the responses. This package targets
the linux kernel (lk) 2.6 and lk 2.4 series.

Two interfaces are available: the mptctl pass through used
by MPT Fusion SAS HBAs and the smp_portal sysfs attribute
pass through used by at least one aic94xx based Linux driver.

For an overview and examples of smp_utils see:
http://www.torque.net/sg/smp_utils.html
A tarball, rpm and deb can be found in table 2.

CHANGELOG (since version 0.91):
  - all: suggest using '-v' if smp_send_req() fails
  - smp_lib: sync function names and results with sas2r07
  - smp_rep_general: sync with sas2r07
  - smp_rep_route_info: add '--multiple' and '--num=
options. Underlying SMP function may be called multiple
times to show one line per phy's route table.
  - smp_lib.h: add C++ 'extern C ' wrapper
  - smp_discover+smp_discover_list: sync with sas2r07
  - smp_conf_general: add new SMP function
  - smp_utils.8: suggestions for finding expander SAS addresses
and mptsas ioc_num
  - Makefile.freebsd: builds utilities on FreeBSD
  - man pages: cleanup


Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: lsscsi version 0.19 beta

2006-12-07 Thread Douglas Gilbert

James Bottomley wrote:
 On Thu, 2006-12-07 at 01:10 -0500, Douglas Gilbert wrote:
 The change proposed by James Bottomley that prompted
 the beta has not materialized.
 
 You'll have to remind me:  which change was this?

James,
Yes, I'm fuzzy on the details as well. Here is part of
the lsscsi ChangeLog. Do the last two entries ring a bell?

Version 0.19 2006/12/06
  - add transport identifiers (target+initiator, port+node)
  - enhance host name search when proc_name is NULL
  - implement filter option for '--hosts'
- accept 'hostn' as first item in filter to mean host n
  - output more host attributes when '-Hll' given
  - add '--list' (or '-L') option output attribute=value
entries, one per line

Version 0.18 2006/3/24
  - cope with dropping of 'generic' symlink post lk 2.6.16
  - anticipate the future removal of 'tape' symlink

Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] libata: Simulate REPORT LUNS for ATAPI devices

2006-12-07 Thread Douglas Gilbert

Jeff Garzik wrote:
 Darrick J. Wong wrote:
 The Quantum GoVault SATAPI removable disk device returns ATA_ERR in
 response to a REPORT LUNS packet.  If this happens to an ATAPI device
 that is attached to a SAS controller (this is the case with sas_ata),
 the device does not load because SCSI won't touch a SCSI device
 that won't report its LUNs.  Since most ATAPI devices don't support
 multiple LUNs anyway, we might as well fake a response like we do for
 ATA devices.

 Signed-off-by: Darrick J. Wong [EMAIL PROTECTED]
 
 Seems sane to me, but I would like additional comment/testing/etc.
 before applying...

A SCSI target contains zero or more logical units. As
in this case, those logical units may use a different
transport. In such cases a SCSI target needs to emulate responses
to some SCSI commands (and modify the responses to others).
Here is a list that is probably not comprehensive:
  - INQUIRY  (peripheral qualifier in standard response)
  - INQUIRY, device identification VPD page (0x83)
   - obviously for the device name+identifier and port
 name+identifier
   - may need to concatenate those with the lu's
 name+identifier
  - INQUIRY, SCSI ports VPD page
  - INQUIRY, ATA Information VPD page (for SAT)
  - REPORT LUNS [mandatory in SPC-3 hence mandatory in SAT]
  - protocol specific port mode page (0x19)
  - protocol specific lu mode page (0x18) [could simulate]
  - PATA control mode page (0xa,0xf1) (for SAT)
  - protocol specific port _log_ page (0x18)

And for SAT you could add the ATA PASS-THROUGH
commands to that list. Those that are really ambitious
could implement well know logical units (wluns) which are
essentially a clean way to talk directly to the target
rather than a logical unit.


About the multi-lun ATAPI devices comment: how would libata
represent multiple S-ATAPI devices connected to a SATA
port multiplier?

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] libata: Simulate REPORT LUNS for ATAPI devices

2006-12-07 Thread Douglas Gilbert

James Bottomley wrote:
 On Mon, 2006-12-04 at 15:32 -0800, Darrick J. Wong wrote:
 The Quantum GoVault SATAPI removable disk device returns ATA_ERR in
 response to a REPORT LUNS packet.  If this happens to an ATAPI device
 that is attached to a SAS controller (this is the case with sas_ata),
 the device does not load because SCSI won't touch a SCSI device
 that won't report its LUNs.  Since most ATAPI devices don't support
 multiple LUNs anyway, we might as well fake a response like we do for
 ATA devices.
 
 Actually, there may be a standards conflict here.  SPC says that all
 devices reporting compliance with this standard (as the inquiry data for
 this device claims) shall support REPORT LUNS.  On the other hand, MMC
 doesn't list REPORT LUNS in its table of mandatory commands.

MMC-5 rev 4 section 7.1:
Some commands that may be implemented by MM drives are
not described in this standard, but are found in other
SCSI standards. For a complete list of these commands
refer to [SPC-3].

Hmm, may be implemented yet REPORT LUNS is mandatory
in SPC-3 (and SPC-3 is a normative reference for MMC-5).
I guess there is wriggle room there.
In practice, MMC diverges from SPC a lot more than other
SCSI device type command sets (e.g. SBC and SSC).

 I'm starting to think that even if they report a SCSI compliance level
 of 3 or greater, we still shouldn't send REPORT LUNS to devices that
 return MMC type unless we have a white list override.

There is also SAT compliance. For the ATA command set (i.e.
disks) sat-r09 lists REPORT LUNS and refers to SPC-3. For
ATAPI sat-r09 is far less clear. It does recommend, for
example, that the ATA Information VPD pages is implemented
in the SATL for ATAPI devices.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bug 7026] CD/DVD burning with USB writer doesn't work

2006-12-06 Thread Douglas Gilbert

James Bottomley wrote:
 On Wed, 2006-12-06 at 11:32 -0500, Alan Stern wrote:
 But how did he get the file descriptor?  He opened a device name, which
 could have been used to get the sysfs file.
 The device name was probably something like /dev/sg0.  This doesn't easily
 permit one to find the corresponding sysfs filename for the real
 underlying device, although it can be done with difficulty.  (That's why I
 used the excessively-ornate sysfs pathname in the Bugzilla entry.)  It
 certainly wouldn't be as easy as using an ioctl.

 It wouldn't be as uniform either.  The search through sysfs would have to 
 be different depending on whether the device name was /dev/sr0 or 
 /dev/sg0.
 
 Realistically, no-one makes SCSI CDs or DVDs any more ... I know, I've
 tried to get some for some of my older boxes.  Most of them nowadays are
 IDE attachments, which don't have a /dev/sg node.  So /dev/sg is really
 the legacy mode for burning.  The correct way to do it today is to use
 the actual device name ... then you don't have to worry about what the
 transport is any more.

All CD and DVD drive these days use SCSI. That is
SCSI command sets: MMC and SPC. Very few use the
SCSI Parallel Interface (SPI). An increasing number
will be using S-ATAPI and they could be seen by
the OS via SCSI transports: FC and SAS (+ SATA).

 Is the patch below acceptable?
 Really, no.  The parameter you're fishing for is a block parameter, not
 a SCSI parameter ... it should really be a block ioctl if we have to
 have an ioctl at all.
 I could easily enough rewrite the patch to put the ioctl somewhere else
 (although I'm not quite sure exactly where would be best).  But do
 non-block devices have request queues?  What about the points that Doug
 raised:
 
 All CD/DVD burners are block devices, which is the problem set under
 discussion.

CD/DVD burners are block device for read operations
only. When they are burning they are not block
devices in the normal sense.

So if this was classic Unix a block device node would
be used for reading and a raw device node for writing.
Just like  I'm wasting keystrokes.

 On Tue, 5 Dec 2006, Douglas Gilbert wrote:

 Apart from sensibly yielding the max size in bytes, your patch
 has the added benefit of allowing non-block devices (e.g. tape,
 processor and enclosure services) to find out what limit the
 OS/host has placed on each command's maximum transfer size.
 
 They all possess block queues, yes, so we should really allow access to
 the block ioctls for them.
 
 If you manage to get that ioctl in, then ungrateful people
 will ask for the corresponding set operation as well.


 To illustrate the /sys mess look at naming of the sysfs approach
 to this problem. For example:
   /sys/block/sde/queue/max_sectors_kb
 - it is not only a block property
 - sde is an end device and suggests information from that
   device's Block Limits VPD page, actually it is a limit
   imposed by the OS and the host used to access that device
 - what has queue got to do with it?
 - max_sectors_kb should have units of bytes
 In addition to all of these points, there remains the peculiar location of 
 the SG_ ioctls.  They are implemented it two places in the kernel: 
 block/scsi_ioctl.c and drivers/scsi/sg.c.  And the two implementations of 
 e.g. SG_SET_RESERVED_SIZE don't even do the same thing!
 
 I have no idea why the block layer even implements
 SG_SET_RESERVED_SIZE ... I suspect it was some legacy application
 compatibility problem, so it probably can be eliminated.

It was put there to trick cdrecord!

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bug 7026] CD/DVD burning with USB writer doesn't work

2006-12-06 Thread Douglas Gilbert

James Bottomley wrote:
 On Wed, 2006-12-06 at 18:49 +0100, Joerg Schilling wrote:
 Please keep in mind: all CD/DVD burners are SCSI devices.
 
 This is probably semantics, but nowadays, SCSI means SPI (or parallel
 SCSI).  I think you're trying to say that they're all devices that obey
 the MMC standard?  Which is true, but not really relevant.

James,
SPI is dead. Get used to it. SCSI has not meant SPI for
years. We should be in the business of disabusing people
of that idea, not reinforcing it.

If you went to www.t10.org and looked at draft documents
and the reflector you would be lucky to find any documents
or posts about SPI in the last two years.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bug 7026] CD/DVD burning with USB writer doesn't work

2006-12-06 Thread Douglas Gilbert

James Bottomley wrote:
 On Wed, 2006-12-06 at 13:38 -0500, Douglas Gilbert wrote:
 SPI is dead. Get used to it. SCSI has not meant SPI for
 years. We should be in the business of disabusing people
 of that idea, not reinforcing it.
 
 I don't believe I said anything in favour of or against SPI.

James,
My objection, and I believe Joerg's objection, is how
people would interpret this statement by you:
This is probably semantics, but nowadays, SCSI means
SPI (or parallel SCSI).

One could deduce from that statement, falsely, that the
linux SCSI subsystem was the linux SPI subsystem. Hence
we should mark it as legacy (and stop libata and the new
ATA subsystem from using it).

 I think you'll find the whole point of SAM is separating the command set
 from the transport and interconnect.  Saying a device speaks SCSI has
 no real meaning in that context anymore.  It's commonly taken to mean
 SCSI-2 where the whole things was lumped together and SPI centric.

SCSI is a storage architecture, a group of command sets and a
group of transports. The original SCSI transport, now considered
legacy (a horribly non-technical word) is SPI.

 In the SAM context, a modern IDE CD is MMC over an ATAPI or SATAPI
 transport. An old SCSI CD is MMC over SPI.  The thing Alan's having
 trouble with is MMC over a USB transport.

Agreed. And USB mass storage would probably be the most
used SCSI transport nowadays. Folks can and have written
their own subsystems for handling USB mass storage but
sooner or later they are going to be looking at read
capacity, sense buffers and mode pages. That is why the
SCSI subsystem continues to be relevant.


Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

lsscsi version 0.19 beta

2006-12-06 Thread Douglas Gilbert

The last announcement I made to this list about lsscsi
was back in March and that was a beta for lsscsi version
0.18 . The change proposed by James Bottomley that prompted
the beta has not materialized. So I decided to release
version 0.18 without fanfare a week ago and start adding
transport information to lsscsi, dubbing it version 0.19
beta. See http://www.torque.net/scsi/lsscsi.html for
downloads.

The mushrooming of information (and different representations)
in /sys has made it possible for lsscsi to provide a lot
more information than it has previously. Ironically what
storage device identification really needs is not available,
namely the logical unit _name_ which, for SCSI devices, is
obtained from the device identification VPD page (0x83).
As a consolation there is lots of transport information.

So this beta adds transport information, both target
and initiator (host) for these transports:
  - FC
  - SAS

I hope to add iSCSI if I can find a way through its maze.
Perhaps USB and 1394 are candidates as well, even SPI.
In the case of SAS, both the SAS transport layer and the
SAS class (i.e. Luben Tuikov's design) representations
are supported.

The new options are '--transport' (or '-t') and '--list'
(or '-L').

Here is an example where disk strings are insufficient:
# lsscsi
[4:0:0:0]diskATA  ST3160812AS  D /dev/sda
[4:0:1:0]diskSEAGATE  ST336754SS   0003  /dev/sdb
[4:0:2:0]diskSEAGATE  ST336754SS   0003  /dev/sdc
[4:0:3:0]diskATA  ST380013AS   3.18  /dev/sdd
[4:0:4:0]diskSEAGATE  ST336754SS   0003  /dev/sde
[4:0:5:0]diskSEAGATE  ST336754SS   0003  /dev/sdf
[5:0:0:0]diskSEAGATE  ST336754SS   0003  /dev/sdg
[5:0:1:0]diskSEAGATE  ST336754SS   0003  /dev/sdh
[5:1:0:0]diskSEAGATE  ST336754SS   0003  /dev/sdi
[5:1:1:0]diskSEAGATE  ST336754SS   0003  /dev/sdj

How many disks are there? Looking at the transport information:
# lsscsi -t
[4:0:0:0]disksas:0x0b1d2c035c7e5d4c  /dev/sda
[4:0:1:0]disksas:0x5000c55208ed  /dev/sdb
[4:0:2:0]disksas:0x5000c5520a29  /dev/sdc
[4:0:3:0]disksas:0x500605b033e1  /dev/sdd
[4:0:4:0]disksas:0x5000c55208ee  /dev/sde
[4:0:5:0]disksas:0x5000c5520a2a  /dev/sdf
[5:0:0:0]disksas:5000c55208ed/dev/sdg
[5:0:1:0]disksas:5000c5520a29/dev/sdh
[5:1:0:0]disksas:5000c55208ee/dev/sdi
[5:1:1:0]disksas:5000c5520a2a/dev/sdj

So everything is SAS attached, including two SATA disks.
Something strange is happening with 4:0:0:0 which is
directly attached to the host4. From the target SAS
addresses it can be seen that /dev/sdc and /dev/sdh
are the same port (and because the lun is 0 in both
cases, it must be the same lu). There are three other
pairs there, reducing what looks like 10 disks to
six. The adjacent SAS addresses are dual ports on the
same disk, so the actual number of disks is 4.
Why are some SAS addresses prefixed with 0x and other
not? lsscsi simply prints out what is in /sys !

To fetch further information about the target that contains
/dev/sdf using a filter to reduce clutter:
# lsscsi --transport --list 4:0:5:0
[4:0:5:0]disksas:0x5000c5520a2a  /dev/sdf
  transport=sas
  initiator_port_protocols=none
  initiator_response_timeout=0
  I_T_nexus_loss_timeout=1744
  phy_identifier=11
  ready_led_meaning=0
  sas_address=0x5000c5520a2a
  target_port_protocols=ssp

A similar check on the target containing /dev/sdj
# lsscsi -t -L 5:1:1
[5:1:1:0]disksas:5000c5520a2a/dev/sdj
  transport=sas
  sub_transport=sas_class
  device_name=
  dev_type=end device
  iproto=
  iresp_timeout=0x
  linkrate=3,0 Gbps
  max_linkrate=3,0 Gbps
  max_pathways=1
  min_linkrate=3,0 Gbps
  pathways=1
  ready_led_meaning=0
  rl_wlun=0
  sas_addr=5000c5520a2a
  tproto=SSP
  transport_layer_retries=0

Finally here is a listing of hosts, then a listing of hosts
with their initiator identifier (if known) and finally a
closer look at host4 (with and without transport specific
information):
# lsscsi --hosts
[0]sata_nv
[1]sata_nv
[2]sata_nv
[3]sata_nv
[4]mptsas
[5]aic94xx

# lsscsi --hosts --transport
[0]sata_nv
[1]sata_nv
[2]sata_nv
[3]sata_nv
[4]mptsassas:0x500605b6f260
[5]aic94xx   sas:5d10002dc000

# lsscsi -H -t --list 4
[4]mptsassas:0x500605b6f260
  transport=sas
  device_type=end device
  initiator_port_protocols=smp, stp, ssp
  invalid_dword_count=0
  loss_of_dword_sync_count=0
  maximum_linkrate=3.0 Gbit
  maximum_linkrate_hw=3.0 Gbit
  minimum_linkrate=1.5 Gbit
  minimum_linkrate_hw=1.5 Gbit
  negotiated_linkrate=Unknown
  phy_identifier=0
  phy_reset_problem_count=0
  running_disparity_error_count=0

Re: [Bug 7026] CD/DVD burning with USB writer doesn't work

2006-12-05 Thread Douglas Gilbert


Alan Stern wrote:
I decided to do this by email instead of bugzilla so that it would be 
visible to everyone on the linux-scsi mailing list.


Re: http://bugzilla.kernel.org/show_bug.cgi?id=7026

To recap: Joerg Schilling needs to be able to retrieve the max_sectors 
value for a SCSI device's request queue.  Doing it via sysfs is rather 
clumsy, especially when only a file descriptor is available and not the 
device name.  He has asked for an ioctl interface to provide the 
information.


Is the patch below acceptable?


Alan,
I just spent an hour thinking about how to data mine through
that dreadful mess that /sys has become as I try to add
transport information to lsscsi.

And then this post made my day. Fancy that, adding a new
ioctl!! I hope the ioctl police aren't watching :-)

Apart from sensibly yielding the max size in bytes, your patch
has the added benefit of allowing non-block devices (e.g. tape,
processor and enclosure services) to find out what limit the
OS/host has placed on each command's maximum transfer size.

If you manage to get that ioctl in, then ungrateful people
will ask for the corresponding set operation as well.


To illustrate the /sys mess look at naming of the sysfs approach
to this problem. For example:
 /sys/block/sde/queue/max_sectors_kb
   - it is not only a block property
   - sde is an end device and suggests information from that
 device's Block Limits VPD page, actually it is a limit
 imposed by the OS and the host used to access that device
   - what has queue got to do with it?
   - max_sectors_kb should have units of bytes
And /sys has the horrible side effect of enshrining a badly
conceived design in a user interface (and SAS comes to mind).


Best of luck
Doug Gilbert


BTW Joerg: SG_SET_RESERVED_SIZE simply makes it extremely
unlikely that the sg driver will not be able to fetch
enough memory from the kernel to move data associated with
a SCSI command. The block layer SG_IO just fudges that.
While a major concern in lk 2.0, memory starvation is typically
not a major concern in lk 2.6 assuming modern hardware.
The sg driver's reserved buffer has other uses as
FUJITA Tomonori pointed out yesterday on the linux-scsi list.


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: problems to expect with 2TB volumes

2006-11-29 Thread Douglas Gilbert

Bernd Schubert wrote:
 Hi,
 
 we have not bought the device yet, but presently in the process to do so.
 Before we buy it, I want to know about problems in advance...

None that I'm aware of from the point of view of the
Linux SCSI subsystem (starting at about half way through
the lk 2.4 series or 4 years ago).

 I'm somewhat worried about this problem report
 http://lists.freebsd.org/pipermail/aic7xxx/2006-January/thread.html#4280
 Especially as I don't see a final solution... 
 We also want to buy the very same raid device and also connect it to an
 already existing aic79xx controller.

On reviewing that thread, the original poster was
jumping to premature conclusions. Justin Gibbs told
him there was no such problem (and Justin is well
placed to know). Then the final post shows a trace
with READ(10) commands failing. They are 32 bit lba
read operations that have been the default for about
10 years in the SCSI subsystem. If those fail on that
transport it is probably a termination problem.
When Justin saw that he probably didn't bother
responding again.

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: aic94xx panic on module load

2006-11-28 Thread Douglas Gilbert

Mark Haverkamp wrote:
 I got this panic when loading the aic94xx module.  The adapter is
 connected to an HP MSA50 SAS enclosure with 3 72GB SAS disks.
 
 Kernel 2.6.19-rc6-scsi-misc on an x86_64
 
 ---
 
 
 aic94xx: Adaptec aic94xx SAS/SATA driver version 1.0.2 loaded
 aic94xx: found Adaptec AIC-9410W SAS/SATA Host Adapter, device :08:01.0
 aic94xx: BIOS present (1,2), 1673
 aic94xx: ue num:3, ue size:88
 aic94xx: manuf sect SAS_ADDR 5d100045af00
snip

 sas: phy1 matched wide port0
 sas: phy1 added to port0, phy_mask:0x3
 sas: phy2 matched wide port0
 sas: phy2 added to port0, phy_mask:0x7
 sas: phy3 matched wide port0
 sas: phy3 added to port0, phy_mask:0xf
 sas: DOING DISCOVERY on port 0, pid:3524
 sas: ex 500508b300a27a2f phy00:T attached: 500508b300a27a3f
 sas: ex 500508b300a27a2f phy01:T attached: 500508b300a27a3f
 sas: ex 500508b300a27a2f phy02:T attached: 
 sas: ex 500508b300a27a2f phy03:T attached: 
 sas: ex 500508b300a27a2f phy04:S attached: 5d100045af00
 sas: ex 500508b300a27a2f phy05:S attached: 5d100045af00
 sas: ex 500508b300a27a2f phy06:S attached: 5d100045af00
 sas: ex 500508b300a27a2f phy07:S attached: 5d100045af00
 sas: ex 500508b300a27a2f phy08:T attached: 
 sas: ex 500508b300a27a2f phy09:T attached: 
 sas: ex 500508b300a27a2f phy10:T attached: 
 sas: ex 500508b300a27a2f phy11:T attached: 
 sas: ex 500508b300a27a2f phy12:D attached: 500508b300a27a2c
 sas: ex 500508b300a27a3f phy00:D attached: 5000c595f8b5
 sas: ex 500508b300a27a3f phy01:D attached: 
 sas: ex 500508b300a27a3f phy02:D attached: 5000c595d3b5
 sas: ex 500508b300a27a3f phy03:D attached: 
 sas: ex 500508b300a27a3f phy04:D attached: 5000c595c0b9
 sas: ex 500508b300a27a3f phy05:D attached: 
 sas: ex 500508b300a27a3f phy06:D attached: 
 sas: ex 500508b300a27a3f phy07:D attached: 
 sas: ex 500508b300a27a3f phy08:D attached: 
 sas: ex 500508b300a27a3f phy09:D attached: 
 sas: ex 500508b300a27a3f phy10:S attached: 500508b300a27a2f
 sas: ex 500508b300a27a3f phy11:S attached: 500508b300a27a2f
 sas: task finished with resp:0x0, stat:0x89
 sas: sas_discover_sata() for device 500508b300a27a2c at 500508b300a27a2f:0xc 
 returned 0xff06
 kobject_add failed for port-2:0:12 with -EEXIST, don't try to register things 
 with the same name in the same directory.

So this is an interesting expander setup within the enclosure.
There are two expanders (500508b300a27a2f + 500508b300a27a3f)
interconnected via a two wide link (0,1 - 10,11 (T-S)) with
a four wide link back to the 94xx HBA (4,5,6,7 - 0,1,2,3).
My guess is that 500508b300a27a2f:12 is virtual and contains a
SES target. That leaves SAS disks on 500508b300a27a3f:0,
500508b300a27a3f,2 and 500508b300a27a3f,4

The pain starts immediately after the sas transport layer
tries to process those expander SMP DISCOVER responses.
The trace seems to suggest the device at 500508b300a27a2f:12
is SATA: extremely unlikely.

Mark, do you have a LSI MPT Fusion SAS HBA handy? If
so you might connect the enclosure to it, get smp_utils
and do something like:
 # modprobe mptctl
 # smp_discover -p 12 -s 0x500508b300a27a2f /dev/mptctl

and post the output.


BTW Darrick, SATA disks connected to an expander usually
get SAS addresses like expander_sas_address + n where
n is small. The device attached to 500508b300a27a2f:12
is in that region: 500508b300a27a2c

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Disable SCSI-Reservation at the driver level ?

2006-11-28 Thread Douglas Gilbert

James Bottomley wrote:
 On Sun, 2006-11-26 at 17:31 +0100, roland wrote:
 VMWare ESX refuses to create VMFS Filesystem on SATA disk, attached to a 
 onBoard SAS controller (lsi1068).
 When i raid1 two SATA disks, it works, if i use a single SATA disk, the 
 controller seems to expose the disk differently to the operating system 
 and creation of a VMFS fails due to missing ability to issue SCSI 
 reservation command.
 
 There's no SCSI fix for this ... the SAT has no translation for the SCSI
 reservation commands, largely because there's no corresponding ATA
 equivalent and even for SCSI devices they may fail anyway.  The
 application should cope with such a failure, so in this case it's the
 application that needs fixing.

SAT originally did have persistent reservations and it
was dropped and is back on the agenda for SAT-2. A SAT
layer (such as the one found in libata) can do more
that just translate command, it may also emulate SCSI
commands.

And PERSISTENT RESERVE IN and OUT (and maybe the older
RESERVE and RELEASE) would be very good candidates for
emulation. To do this however libata would need to be
a lot more transport aware than it is now. To do such
an emulation a SAT layer needs to know:
  a) whether it has full control over the SATA device
 (i.e. there is no other path to it) and failing
 that, it has some other mechanism such as
 affiliations in SAS with SMP available to control
 them
  b) the identity of the initiator (port) asking for
 the reservation.

If libata could do this it would add a lot of value
over and above simple command translation.

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: aic94xx panic on module load

2006-11-28 Thread Douglas Gilbert

Mark Haverkamp wrote:
 On Tue, 2006-11-28 at 13:46 -0800, Mark Haverkamp wrote:
 On Tue, 2006-11-28 at 13:44 -0500, Douglas Gilbert wrote:

 [ ... ]

 
 I don't know if this helps, but I found the verbose option.  Here is a
 little debug output.
 
 
 ./smp_discover -v  -p 12 -s 0x500508b300a27a2f /dev/mptctl
 Discover request: 40 10 00 02 00 00 00 00 00 0c 00 00 00 00 00 00
 send_req_mpt: subvalue=0  SAS address=0x500508b300a27a2f
 mptctl two scatter gather list interface
 IOCStatus=0x1
 IOCStatus=0x1 IOCLogInfo=0xA27A2F SASStatus=0x0
 smp_send_req failed, res=-1

Mark,
The iocnum may be greater than 0 (especially if you have
other MPT Fusion HBAs (any kind) in that computer).
Have a look in the log around where the mptsas driver
is registered and look for the string ioc. The number
following ioc is what you need. If you find ioc3 then
try:

 ./smp_discover -p 12 -s 0x500508b300a27a2f /dev/mptctl,3

To verify that expander SAS address, try this:
  find /sys -name sas_device:expander*
cd to any directory found and try cat sas_address.


BTW there is a smp_utils version 0.92 beta at
http://www.torque.net/sg
the error messages are somewhat clearer.


Doug Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

< 2 3 4 5 6 7 8 >

601 - 700 of 784 matches

Mail list logo