Re: [PATCH] scsi: libosd: Remove VLA usage
On 03/05/18 01:55, Kees Cook wrote: > On the quest to remove all VLAs from the kernel[1] this rearranges the > code to avoid a VLA warning under -Wvla (gcc doesn't recognize "const" > variables as not triggering VLA creation). Additionally cleans up variable > naming to avoid 80 character column limit. > > [1] > https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com > ACK-BY: Boaz Harrosh <o...@electrozaur.com> > Signed-off-by: Kees Cook <keesc...@chromium.org> > --- > drivers/scsi/osd/osd_initiator.c | 16 > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/drivers/scsi/osd/osd_initiator.c > b/drivers/scsi/osd/osd_initiator.c > index e18877177f1b..917a86a2ae8c 100644 > --- a/drivers/scsi/osd/osd_initiator.c > +++ b/drivers/scsi/osd/osd_initiator.c > @@ -1842,14 +1842,14 @@ int osd_req_decode_sense_full(struct osd_request *or, > case osd_sense_response_integrity_check: > { > struct osd_sense_response_integrity_check_descriptor > - *osricd = cur_descriptor; > - const unsigned len = > - sizeof(osricd->integrity_check_value); > - char key_dump[len*4 + 2]; /* 2nibbles+space+ASCII */ > - > - hex_dump_to_buffer(osricd->integrity_check_value, len, > -32, 1, key_dump, sizeof(key_dump), true); > - OSD_SENSE_PRINT2("response_integrity [%s]\n", key_dump); > + *d = cur_descriptor; > + /* 2nibbles+space+ASCII */ > + char dump[sizeof(d->integrity_check_value) * 4 + 2]; > + > + hex_dump_to_buffer(d->integrity_check_value, > + sizeof(d->integrity_check_value), > + 32, 1, dump, sizeof(dump), true); > + OSD_SENSE_PRINT2("response_integrity [%s]\n", dump); > } > case osd_sense_attribute_identification: > { >
Re: [PATCH, RFC] MAINTAINERS: update OSD entries
On 05/02/2017 02:33 PM, Jeff Layton wrote: > On Tue, 2017-05-02 at 09:57 +0200, Christoph Hellwig wrote: >> The open-osd domain doesn't exist anymore, and mails to the list lead >> to really annoying bounced that repeat every day. >> >> Also the primarydata address for Benny bounces, and while I have a new >> one for him he doesn't seem to be maintaining the OSD code any more. >> >> Which beggs the question: should we really leave the Supported status >> in MAINTAINERS given that the code is barely maintained? >> >> Signed-off-by: Christoph Hellwig <h...@lst.de> >> --- >> MAINTAINERS | 4 >> 1 file changed, 4 deletions(-) >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index 1bb06c5f7716..28dd83a1d9e2 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -9418,10 +9418,6 @@ F:drivers/net/wireless/intersil/orinoco/ >> >> OSD LIBRARY and FILESYSTEM >> M: Boaz Harrosh <o...@electrozaur.com> >> -M: Benny Halevy <bhal...@primarydata.com> >> -L: osd-...@open-osd.org >> -W: http://open-osd.org >> -T: git git://git.open-osd.org/open-osd.git >> S: Maintained >> F: drivers/scsi/osd/ >> F: include/scsi/osd_* > > Hah, you beat me to it! I was going to spin up a patch for this today. > > Acked-by: Jeff Layton <jlay...@redhat.com> Acked-by: Boaz Harrosh <o...@electrozaur.com> >
Re: [PATCH 3/4] nfs: remove the objlayout driver
On 04/21/2017 05:00 PM, Trond Myklebust wrote: > Maintenance is not development. It’s about doing all the followup > _after_ the feature is declared to be developed. That’s been missing > for quite some time in the case of the OSD pNFS code, which is why > I’m not even bothering to consider staging. If you are saying you are > still maintaining exofs, and want to continue doing so, then great, > but note that there is a file called BUGS in that directory that has > been untouched since 2008, and that’s why I thing staging is a good > idea. > No, the BUGS file is just stale. As you said was not ever updated. All the bugs (1) in there do no longer exist for a long long time. I will send a patch to remove the file. Yes I maintain this fs. It has the complete fixture list, of what was first intended. I keep running it every major kernel release to make sure it actually runs, and there are no regressions. If this FS stands in the way of any new development please anyone contact me and I will help in the conversion. Of exofs and the osd-uld. Thanks Boaz
Re: [PATCH 3/4] nfs: remove the objlayout driver
On 04/20/2017 11:18 PM, Trond Myklebust wrote: <> > > OK. I'm applying this patch for the 4.12 merge window. That is understandable this code was not tested for a long while > If, as Boaz > suggests, there is still an interest in exofs, then I suggest we put > that to the test by moving it into the STAGING area, to see if someone > will step up to maintain it. > I do not understand what you mean by "someone will maintain it" What is it that is missing in exofs that you see needs development? As I said I regularly run and test this FS and update as things advance there where no new fixtures for a while, but no regressions either. Please explain what it means "will maintain it" > Cheers > Trond > Thanks Boaz
Re: linux-next: manual merge of the scsi-mkp tree with the char-misc tree
On 04/07/2017 10:50 PM, Bart Van Assche wrote: > On Fri, 2017-04-07 at 13:29 -0600, Logan Gunthorpe wrote: >> On 07/04/17 09:49 AM, Bart Van Assche wrote: >>> Sorry that I had not yet noticed Logan's patch series. Should my two >>> patches that conflict with Logan's patch series be dropped and reworked >>> after Logan's patches are upstream? >> >> Yeah, Greg took my patchset around a few maintainers relatively quickly. >> This is the second conflict, so sorry about that. Looks like the easiest >> thing would be to just base your change off of mine. It doesn't look too >> difficult. If you can do it before my patch hits upstream, I'd >> appreciate some testing and/or review as no one from the scsi side >> responded and that particular patch was a bit more involved than I would >> have liked. > > Boaz, had you noticed Logan's osd patch? If not, can you have a look? > I did look, I even sent an ACK on one of the early versions. The merge breakage is more of a build issue because I never had get_device fail for me in my testing so it is more academic. Yes they both look fine BTW Boaz > Thanks, > Bart. >
Re: [PATCH 1/4] block: remove the osdblk driver
On 04/12/2017 07:01 PM, Christoph Hellwig wrote: > This was just a proof of concept user for the SCSI OSD library, and > never had any real users. > > Signed-off-by: Christoph Hellwig <h...@lst.de> Yes please remove this driver ACK-by Boaz Harrosh <o...@electrozaur.com> > --- > drivers/block/Kconfig | 16 -- > drivers/block/Makefile | 1 - > drivers/block/osdblk.c | 693 > - > 3 files changed, 710 deletions(-) > delete mode 100644 drivers/block/osdblk.c > > diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig > index a1c2e816128f..58e24c354933 100644 > --- a/drivers/block/Kconfig > +++ b/drivers/block/Kconfig > @@ -312,22 +312,6 @@ config BLK_DEV_SKD > > Use device /dev/skd$N amd /dev/skd$Np$M. > > -config BLK_DEV_OSD > - tristate "OSD object-as-blkdev support" > - depends on SCSI_OSD_ULD > - ---help--- > - Saying Y or M here will allow the exporting of a single SCSI > - OSD (object-based storage) object as a Linux block device. > - > - For example, if you create a 2G object on an OSD device, > - you can then use this module to present that 2G object as > - a Linux block device. > - > - To compile this driver as a module, choose M here: the > - module will be called osdblk. > - > - If unsure, say N. > - > config BLK_DEV_SX8 > tristate "Promise SATA SX8 support" > depends on PCI > diff --git a/drivers/block/Makefile b/drivers/block/Makefile > index b12c772bbeb3..42e8cee1cbc7 100644 > --- a/drivers/block/Makefile > +++ b/drivers/block/Makefile > @@ -22,7 +22,6 @@ obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o > obj-$(CONFIG_MG_DISK)+= mg_disk.o > obj-$(CONFIG_SUNVDC) += sunvdc.o > obj-$(CONFIG_BLK_DEV_SKD)+= skd.o > -obj-$(CONFIG_BLK_DEV_OSD)+= osdblk.o > > obj-$(CONFIG_BLK_DEV_UMEM) += umem.o > obj-$(CONFIG_BLK_DEV_NBD)+= nbd.o > diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c > deleted file mode 100644 > index 8127b8201a01.. > --- a/drivers/block/osdblk.c > +++ /dev/null > @@ -1,693 +0,0 @@ > - > -/* > - osdblk.c -- Export a single SCSI OSD object as a Linux block device > - > - > - Copyright 2009 Red Hat, Inc. > - > - This program is free software; you can redistribute it and/or modify > - it under the terms of the GNU General Public License as published by > - the Free Software Foundation. > - > - This program is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - GNU General Public License for more details. > - > - You should have received a copy of the GNU General Public License > - along with this program; see the file COPYING. If not, write to > - the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. > - > - > - Instructions for use > - > - > - 1) Map a Linux block device to an existing OSD object. > - > - In this example, we will use partition id 1234, object id 5678, > - OSD device /dev/osd1. > - > - $ echo "1234 5678 /dev/osd1" > /sys/class/osdblk/add > - > - > - 2) List all active blkdev<->object mappings. > - > - In this example, we have performed step #1 twice, creating two blkdevs, > - mapped to two separate OSD objects. > - > - $ cat /sys/class/osdblk/list > - 0 174 1234 5678 /dev/osd1 > - 1 179 1994 897123 /dev/osd0 > - > - The columns, in order, are: > - - blkdev unique id > - - blkdev assigned major > - - OSD object partition id > - - OSD object id > - - OSD device > - > - > - 3) Remove an active blkdev<->object mapping. > - > - In this example, we remove the mapping with blkdev unique id 1. > - > - $ echo 1 > /sys/class/osdblk/remove > - > - > - NOTE: The actual creation and deletion of OSD objects is outside the > scope > - of this driver. > - > - */ > - > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include > - > -#define DRV_NAME "osdblk" > -#define PFX DRV_NAME ": " > - > -/* #define _OSDBLK_DEBUG */ > -#ifdef _OSDBLK_DEBUG > -#define OSDBLK_DEBUG(fmt, a...) \ > - printk(KERN_NOTICE "osdblk @%s:%d: " fmt, __func__, __LINE__, ##a) > -#else > -#define OSDBLK_DEBUG(fmt, a...) \ > - do { if (0) printk(fmt, ##a); } while
Re: RFC: drop the T10 OSD code and its users
On 04/12/2017 07:01 PM, Christoph Hellwig wrote: Hi Sir Christoph > The only real user of the T10 OSD protocol, the pNFS object layout > driver never went to the point of having shipping products, and the > other two users (osdblk and exofs) were simple example of it's usage. > I understand why osdblk might be a pain, and was broken from day one, and should by all means go away ASAP. But exofs should not be bothering anyone, and as far as I know does not use any special API's except the osd_uld code of course. > The code has been mostly unmaintained for years and is getting in the > way of block / SCSI changes, so I think it's finally time to drop it. > Please tell me what are those changes you are talking about? I might be able to help in conversion. I guess you mean osd_uld and the Upper SCSI API. Just point me at a tree where osd_uld is broken, and perhaps with a little guidance from you I can do a satisfactory conversion. Is true that no new code went in for a long while, but I still from time to time run a setup and test that the all stack, like iscsi-bidi and so on still works. That said, yes only a stand alone exofs was tested for a long time, a full pnfs setup is missing any supporting server. So Yes I admit that pnfs-obj is getting very rotten. And is most probably broken, on the pnfs side of things. [Which I admit makes my little plea kind of mute ;-) ] Every once in a while I get emails from Students basing all kind of interesting experiments on top of the exofs and object base storage. So for the sake of academics and for the sake of a true bidi-stack testing, might we want to evaluate what is the up coming cost, and what is a minimum set we are willing to keep? Please advise? thanks Boaz > These patches are against Jens' block for-next tree as that already > has various modifications of the SCSI code. >
Re: [PATCH] osd_uld: Check scsi_device_get() return value
On 03/30/2017 08:17 PM, Bart Van Assche wrote: > scsi_device_get() can fail. Hence check its return value. > > Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com> > Cc: Boaz Harrosh <bharr...@panasas.com> Cool thanks ACK-by: Boaz Harrosh <o...@electrozaur.com> > --- > drivers/scsi/osd/osd_uld.c | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c > index e0ce5d2fd14d..1dea4244dd0c 100644 > --- a/drivers/scsi/osd/osd_uld.c > +++ b/drivers/scsi/osd/osd_uld.c > @@ -464,14 +464,15 @@ static int osd_probe(struct device *dev) > /* hold one more reference to the scsi_device that will get released >* in __release, in case a logout is happening while fs is mounted >*/ > - scsi_device_get(scsi_device); > + if (scsi_device_get(scsi_device)) > + goto err_put_disk; > osd_dev_init(>od, scsi_device); > > /* Detect the OSD Version */ > error = __detect_osd(oud); > if (error) { > OSD_ERR("osd detection failed, non-compatible OSD device\n"); > - goto err_put_disk; > + goto err_put_sdev; > } > > /* init the char-device for communication with user-mode */ > @@ -508,8 +509,9 @@ static int osd_probe(struct device *dev) > > err_put_cdev: > cdev_del(>cdev); > -err_put_disk: > +err_put_sdev: > scsi_device_put(scsi_device); > +err_put_disk: > put_disk(disk); > err_free_osd: > dev_set_drvdata(dev, NULL); >
Re: [PATCH] scsi: osd_uld: remove an unneeded NULL check
On 03/23/2017 12:41 PM, Dan Carpenter wrote: > We don't call the remove() function unless probe() succeeds so "oud" > can't be NULL here. Plus, if it were NULL, we dereference it on the > next line so it would crash anyway. > > Signed-off-by: Dan Carpenter <dan.carpen...@oracle.com> > Thanks sure! ACK-by Boaz Harrosh <o...@electrozaur.com> > diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c > index 4101c3178411..8b9941a5687a 100644 > --- a/drivers/scsi/osd/osd_uld.c > +++ b/drivers/scsi/osd/osd_uld.c > @@ -507,10 +507,9 @@ static int osd_remove(struct device *dev) > struct scsi_device *scsi_device = to_scsi_device(dev); > struct osd_uld_device *oud = dev_get_drvdata(dev); > > - if (!oud || (oud->od.scsi_device != scsi_device)) { > - OSD_ERR("Half cooked osd-device %p,%p || %p!=%p", > - dev, oud, oud ? oud->od.scsi_device : NULL, > - scsi_device); > + if (oud->od.scsi_device != scsi_device) { > + OSD_ERR("Half cooked osd-device %p, || %p!=%p", > + dev, oud->od.scsi_device, scsi_device); > } > > cdev_device_del(>cdev, >class_dev); >
Re: [Lsf-pc] [LSF/MM TOPIC] do we really need PG_error at all?
On 02/28/2017 03:11 AM, Jeff Layton wrote: <> > > I'll probably have questions about the read side as well, but for now it > looks like it's mostly used in an ad-hoc way to communicate errors > across subsystems (block to fs layer, for instance). If memory does not fail me it used to be checked long time ago in the read-ahead case. On the buffered read case, the first page is read synchronous and any error is returned to the caller, but then a read-ahead chunk is read async all the while the original thread returned to the application. So any errors are only recorded on the page-bit, since otherwise the uptodate is off and the IO will be retransmitted. Then the move to read_iter changed all that I think. But again this is like 5-6 years ago, and maybe I didn't even understand very well. > -- > Jeff Layton> I would like a Documentation of all this as well please. Where are the tests for this? Thanks Boaz
Re: [PATCH v2] osd: remove deadcode
On 02/24/2016 01:21 PM, Sudip Mukherjee wrote: > The variable is_ver1 is always true and so OSD_CAP_LEN can never be > used. > Reported by Coverity. > > Signed-off-by: Sudip Mukherjee <sudip.mukher...@codethink.co.uk> ACK-by: Boaz harrosh <o...@elecrozaur.com> Thanks > --- > > v2: Joe Perches asked to mention the tool used in the commit log. > > drivers/scsi/osd/osd_initiator.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/scsi/osd/osd_initiator.c > b/drivers/scsi/osd/osd_initiator.c > index d8a2b51..3b11aad 100644 > --- a/drivers/scsi/osd/osd_initiator.c > +++ b/drivers/scsi/osd/osd_initiator.c > @@ -2006,9 +2006,8 @@ EXPORT_SYMBOL(osd_sec_init_nosec_doall_caps); > */ > void osd_set_caps(struct osd_cdb *cdb, const void *caps) > { > - bool is_ver1 = true; > /* NOTE: They start at same address */ > - memcpy(>v1.caps, caps, is_ver1 ? OSDv1_CAP_LEN : OSD_CAP_LEN); > + memcpy(>v1.caps, caps, OSDv1_CAP_LEN); > } > > bool osd_is_sec_alldata(struct osd_security_parameters *sec_parms __unused) > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [SCSI] osd: fix signed char versus %02x issue
On 12/08/2015 04:25 PM, Rasmus Villemoes wrote: > If char is signed and one of these bytes happen to have a value > outside the ascii range, the corresponding output will consist of > "ff" followed by the two hex chars that were actually > intended. One way to fix it would be to change the casts to (u8*) aka > (unsigned char*), but it is much simpler (and generates smaller code) > to use the %ph extension which was created for such short hexdumps. > Ha real cool, thanks I hated that crap ACK-by: Boaz Harrosh <o...@electrozaur.com> > Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk> > --- > drivers/scsi/osd/osd_initiator.c | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/drivers/scsi/osd/osd_initiator.c > b/drivers/scsi/osd/osd_initiator.c > index 0cccd6033feb..d8a2b5185f56 100644 > --- a/drivers/scsi/osd/osd_initiator.c > +++ b/drivers/scsi/osd/osd_initiator.c > @@ -170,10 +170,7 @@ static int _osd_get_print_system_info(struct osd_dev *od, > > /* FIXME: Where are the time utilities */ > pFirst = get_attrs[a++].val_ptr; > - OSD_INFO("CLOCK [0x%02x%02x%02x%02x%02x%02x]\n", > - ((char *)pFirst)[0], ((char *)pFirst)[1], > - ((char *)pFirst)[2], ((char *)pFirst)[3], > - ((char *)pFirst)[4], ((char *)pFirst)[5]); > + OSD_INFO("CLOCK [0x%6phN]\n", pFirst); > > if (a < nelem) { /* IBM-OSD-SIM bug, Might not have it */ > unsigned len = get_attrs[a].len; > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device
On 09/30/2015 12:28 PM, Hannes Reinecke wrote: <> > Pushing things into the background is typically not the best of > ideas; actually I've been running into issues with udev not being > complete by the time the next round is started. So more often than > not I would be greeted with messages: > > 'write: no such file or directory' > > when executing this line. Removing the '&' at the end made this > warning go away. > > And actually I'm not sure if the above script is a valid testcase; So are you saying it is allowed to crash the Kernel with a crappy script? > from what I've seen there is no locking / reference counting when > accessing sysfs attributes. So as soon as you _can_ access the sysfs > attribute it is implicitly assumed to be valid. > In your case you will be _removing_ the sysfs attribute even though > it is still accessed, which of course will crash. > Is that allowed? for usermode script to race and crash the Kernel? >From the original email it sounds like this used to be fine and it now crashes (with the &) Thanks Boaz > Can you still reproduce this problem after removing the '&' in that > line? > >> echo "-- delete $dev --" > /dev/kmsg >> echo 1 > /sys/class/scsi_device/${dev}/device/delete >> >> n=$((n + 1)) >> done >> --- cut here -- > > Having said that I've retried your test script with my ALUA handler > update, and didn't find any issues there. > It happily completed about 500 rounds at which point I got bored. > Of course, this is after removing the '&' in the said line. > > Cheers, > > Hannes > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SCSI-OSD: Delete an unnecessary check before the function call put_disk
On 06/24/2015 05:16 PM, SF Markus Elfring wrote: From: Markus Elfring elfr...@users.sourceforge.net Date: Wed, 24 Jun 2015 16:06:21 +0200 The put_disk() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring elfr...@users.sourceforge.net ACK-by: Boaz Harrosh o...@electrozaur.com --- drivers/scsi/osd/osd_uld.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c index 243eab3..e2075522 100644 --- a/drivers/scsi/osd/osd_uld.c +++ b/drivers/scsi/osd/osd_uld.c @@ -407,9 +407,7 @@ static void __remove(struct device *dev) OSD_INFO(osd_remove %s\n, oud-disk ? oud-disk-disk_name : NULL); - - if (oud-disk) - put_disk(oud-disk); + put_disk(oud-disk); ida_remove(osd_minor_ida, oud-minor); kfree(oud); -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 9/9] snic:Add Makefile, patch Kconfig, MAINTAINERS
On 05/27/2015 10:19 AM, Narsimhulu Musini wrote: Kconfig for kbuild Makefile to build snic module Updated MAINTAINERS file Signed-off-by: Narsimhulu Musini nmus...@cisco.com Signed-off-by: Sesidhar Baddela sebad...@cisco.com --- * v3 - Added additional config section (CONFIG_SNIC_DEBUG_FS) for enabling debugging functionality. * v2 - Added compile time flags for debugfs dependent functionality. MAINTAINERS| 7 +++ drivers/scsi/Kconfig | 17 + drivers/scsi/Makefile | 1 + drivers/scsi/snic/Makefile | 21 + 4 files changed, 46 insertions(+) create mode 100644 drivers/scsi/snic/Makefile diff --git a/MAINTAINERS b/MAINTAINERS index 2a97e05..368fb76 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2536,6 +2536,13 @@ L: linux-scsi@vger.kernel.org S: Supported F: drivers/scsi/fnic/ +CISCO SCSI HBA DRIVER +M: Narsimhulu Musini nmus...@cisco.com +M: Sesidhar Baddela sebad...@cisco.com +L: linux-scsi@vger.kernel.org +S: Supported +F: drivers/scsi/snic/ + CMPC ACPI DRIVER M: Thadeu Lima de Souza Cascardo casca...@holoscopio.com M: Daniel Oliveira Nascimento d...@syst.com.br diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index 9c92f41..8baab3f 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -634,6 +634,23 @@ config FCOE_FNIC file:Documentation/scsi/scsi.txt. The module will be called fnic. +config SCSI_SNIC + tristate Cisco SNIC Driver + depends on PCI SCSI X86_64 + help + This is support for the Cisco PCI-Express SCSI HBA. + + To compile this driver as a module, choose M here and read + file:Documentation/scsi/scsi.txt. + The module will be called snic. + +config SCSI_SNIC_DEBUG_FS + bool Cisco SNIC Driver Debugfs Support + depends on SCSI_SNIC DEBUG_FS + help + This enables to list debugging information from SNIC Driver + available via debugfs file system + config SCSI_DMX3191D tristate DMX3191D SCSI support depends on PCI SCSI diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile index 58158f1..f643942 100644 --- a/drivers/scsi/Makefile +++ b/drivers/scsi/Makefile @@ -39,6 +39,7 @@ obj-$(CONFIG_LIBFC) += libfc/ obj-$(CONFIG_LIBFCOE)+= fcoe/ obj-$(CONFIG_FCOE) += fcoe/ obj-$(CONFIG_FCOE_FNIC) += fnic/ +obj-$(CONFIG_SCSI_SNIC) += snic/ obj-$(CONFIG_SCSI_BNX2X_FCOE)+= libfc/ fcoe/ bnx2fc/ obj-$(CONFIG_ISCSI_TCP) += libiscsi.o libiscsi_tcp.o iscsi_tcp.o obj-$(CONFIG_INFINIBAND_ISER)+= libiscsi.o diff --git a/drivers/scsi/snic/Makefile b/drivers/scsi/snic/Makefile new file mode 100644 index 000..572102a --- /dev/null +++ b/drivers/scsi/snic/Makefile @@ -0,0 +1,21 @@ +obj-$(CONFIG_SCSI_SNIC) += snic.o + +snic-y := \ + snic_attrs.o \ + snic_main.o \ + snic_res.o \ + snic_isr.o \ + snic_ctl.o \ + snic_io.o \ + snic_scsi.o \ + snic_disc.o \ + vnic_cq.o \ + vnic_intr.o \ + vnic_dev.o \ + vnic_wq.o + +ifeq ($(CONFIG_SCSI_SNIC_DEBUG_FS), y) +ccflags-y += -DSNIC_DEBUG_FS Why do you need an extra define here just use CONFIG_SCSI_SNIC_DEBUG_FS in source code directly +snic-y += snic_debugfs.o \ + snic_trc.o +endif snic-$(CONFIG_SCSI_SNIC_DEBUG_FS) += snic_debugfs.o You do not the ifeq () thing at all Cheers Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 9/9] snic:Add Makefile, patch Kconfig, MAINTAINERS
On 05/27/2015 01:02 PM, Narsimhulu Musini (nmusini) wrote: +ifeq ($(CONFIG_SCSI_SNIC_DEBUG_FS), y) +ccflags-y += -DSNIC_DEBUG_FS Why do you need an extra define here just use CONFIG_SCSI_SNIC_DEBUG_FS in source code directly Agree, I just want to use a shorter macro in the source. Don't do this. It is a convention that if a programmer sees a CONFIG_XXX he knows that this is settable by a Kconfig. By making it shorter the programmer will think that this is dead code because it is not set anywhere. +snic-y += snic_debugfs.o \ + snic_trc.o +endif snic-$(CONFIG_SCSI_SNIC_DEBUG_FS) += snic_debugfs.o If CONFIG_SCSI_SNIC_DEBUGFS is not set, then it leaves a build variable ³snic- in build system. ifeq() avoids such thing. That is fine. This is done all over the Kernel Makefile. There are two tons of these do nothing make variables It the way we do it in the Kernel. (I actually like it) You do not the ifeq () thing at all Cheers Boaz Thanks simha -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scatterlist: enable sg chaining for all architectures
On 04/29/2015 05:15 AM, James Bottomley wrote: Perhaps the best thing to do is just fix target and call it quits? Right! drivers write code for sg_chaining and on ARCHs that do not support it the code just works. Only the max_sg is smaller and the chaining code never kicks in and is dead code for these ARCHs. James Cheers Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/17] Clear up bidi command confusion
On 01/29/2015 04:41 PM, James Bottomley wrote: On Thu, 2015-01-29 at 15:00 +0200, Boaz Harrosh wrote: On 01/23/2015 03:12 PM, Christoph Hellwig wrote: On Fri, Jan 23, 2015 at 01:05:30PM +0100, Bart Van Assche wrote: There is some confusion in the SCSI core and in SCSI LLDs around the meaning of sc_data_direction and whether or not this member can have the value DMA_BIDIRECTIONAL. Clear up this confusion. The patches in this series are: I wonder if we should change the code to set DMA_BIDIRECTIONAL for bidi commands. That seems a lot more logical than the current version. You cannot do this. Because a Bidi Command is actually two commands one for the WRITE side (DMA_TO_DEVICE) which is this one, and another command for the READ side (DMA_FROM_DEVICE). You're not thinking about this the correct way. DMA_BIDIRECTIONAL is a DMA engine flag. We use it in SCSI for historical reasons (mainly to prevent a translation from the SCSI version). Since it was never used, most SCSI drivers have code to check and reject commands with it set. For actual bidirectional commands, you have to check scsi_bidi_command() and programme the DMA engine separately for both phases, so the flag is useless in this case. This is what I meant how is above different from what I said? The proposal is to make it the signal for bidirectional commands This I disagree, and would be a very bad idea and will take us back, to hacking time. As you noted, please let us leave DMA_BIDIRECTIONAL to the DMA engines. Let us just stir away from the DMA_XXX flags altogether and move back to block layer flags. (in which case most HBAs would make it do the right thing; i.e. reject) This is not necessary, (the check and rejection), commands will not be issued to LLDs that did not mark support for BiDi. So they will never receive such commands and it is added dead code. but it still wouldn't be how you'd program the DMA enigne; that still would have to be done in two phases as it is today. Exactly my point. Only one small correction, these are not two phases, as in phases-1 then phases-2. These are two memory buffers independently programmed for DMA transfer. The actual transfer occurs simultaneously, on both buffers. James Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/17] Clear up bidi command confusion
On 01/23/2015 03:12 PM, Christoph Hellwig wrote: On Fri, Jan 23, 2015 at 01:05:30PM +0100, Bart Van Assche wrote: There is some confusion in the SCSI core and in SCSI LLDs around the meaning of sc_data_direction and whether or not this member can have the value DMA_BIDIRECTIONAL. Clear up this confusion. The patches in this series are: I wonder if we should change the code to set DMA_BIDIRECTIONAL for bidi commands. That seems a lot more logical than the current version. You cannot do this. Because a Bidi Command is actually two commands one for the WRITE side (DMA_TO_DEVICE) which is this one, and another command for the READ side (DMA_FROM_DEVICE). The DMA_XXX_ enum must not to be confused with the t10-scsi Bidi commands. In DMA world the enum means: What are the allowed access on the memory buffer, is Device allowed to read-from-memory-only or write-to-memory-only or both simultaneously on the same buffer. It is a flag to the IO-mmu on the direction of its mappings. A t10-scsi bidi command is two buffers. The command caries two distinct buffers one is only-written-to 2nd only-read-from. But these are two distinct buffers each his proper direction. If you ask me you need to remove sc_data_direction all together and just go directly to the READ/WRITE flag at request level. SCSI has no business duplicating the same information in another member another way. In any way t10-scsi has no use in the entire of its STD of the DMA_BIDIRECTIONAL sense. ie. same buffer, two directional access. So DMA_BIDIRECTIONAL is just dead code for sc_data_direction. It could be imagined but t10-scsi does not have a use for it. Again Not to be confused with one-command two buffers, which is exactly the opposite. Also I don't think all the debug checks for bidi commands that you change should stay at all - driver need to set the QUEUE_FLAG_BIDI to ever see a bidi command. Exactly just remove the all checks. It would also nice to add a host template flag for bidi support instead of having to poke into the block layer request_queue while we're at it. Cheers Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/17] Clear up bidi command confusion
On 01/26/2015 11:58 AM, Bart Van Assche wrote: Hello Christoph, This makes sense to me. I will rework this patch series as you proposed. Do you have a bidi setup to test against? Sending xor command to scsi_dbg is only half the test for me because, yes there are two buffers, but they are of same size so bugs might be masked. (From experience) Thanks Boaz Bart. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/17] Clear up bidi command confusion
On 01/29/2015 03:20 PM, Bart Van Assche wrote: On 01/29/15 14:07, Boaz Harrosh wrote: On 01/26/2015 11:58 AM, Bart Van Assche wrote: Hello Christoph, This makes sense to me. I will rework this patch series as you proposed. Do you have a bidi setup to test against? Sending xor command to scsi_dbg is only half the test for me because, yes there are two buffers, but they are of same size so bugs might be masked. (From experience) Hello Boaz, If anyone would like to submit a scsi_debug patch that adds support to that kernel driver for a bidi command for which the input and output buffers have different sizes I think that would be helpful. Hi Bart. scsi_dbg is a scsi-blk-device type device (forgot the exact name). I do not think there is such a command defined by the STD for this command set. its only for other device types like osd, printer and so on. But sending XOR commands to scsi_dbg is a good start. Bart. Please Note: I do not agree for the use of the constant DMA_BIDIRECTIONAL to denote scsi_bidi_cmnd(cmd). This is wrong and takes us back not forward. If anything at all is done, cmd-sc_data_direction should be just removed all together. It is not needed and denotes nothing more than what is already available at the request level. Please do not add any more new code on top of cmd-sc_data_direction. (At minimum just remove the all DMA_BIDIRECTIONAL checks and you are done) Cheers Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Large disk drives
On 11/06/2014 05:54 PM, James Bottomley wrote: We don't have a failure. This is the problem. Determining that a problem exists OK Sorry. I assumed the bridge is smart enough to do nothing, ie READ_CAPACITY_10 is passed as is via sata to the device that actually supports READ_CAPACITY_16, as I understand then the actual good drive is not suppose to send size-modulue-2G in response to READ_CAPACITY_10. Should it ? Then the bridge just sends that back to me, now if I send READ_CAPACITY_16 the bridge will return NOT-SUPPORTED because it is unexpected. But what are you saying that the bridge was smart enough to do READ_CAPACITY_16 get a 64bit value from the drive and then return the lower 32bit to me ? Really ? I would not imagine in the life of me someone so dumb. And surly it is against any spec. Are you sure ? I think you are wrong I think the guy reported that he can only see 2T out of his 3T drive which means the bridge returned 0x, exactly 2T James But hey, I guess stupidity is limitless, this one here pushing real far. Cheers Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Large disk drives
On 11/05/2014 06:34 PM, Alan Stern wrote: It's simpler than that: The drive is attached directly to the computer (i.e., via SATA rather than USB) when the partition table is created. With no USB-SATA bridge chip to mess things up, there's no problem determining the correct capacity. Right! I think it should be very simple to, just as we send a READ_CAPACITY16 / READ_CAPACITY32 / READ_CAPACITY64 or is it a GET_CODE_PAGE ? whatever we send today we can also have READ_CAPACITY_PART() which will send a READ of sector 0 the raw PC_COMMAND way and run it through some partition analyzer. We only need to support gpt or msdos partitions I think in this case. Then if READ_CAPACITY16 returns all (s) and READ_CAPACITY32 is not supported and/or blacklisted, then yes attempt a READ_CAPACITY_PART() which should be sam-2 compatible. It should not take longer than a weekend afternoon over cup of Machha. If any one sends a bad device my way, I'll do it. But anyone should be able to code something so simple. Alan Stern Cheers Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] osd: Boaz Harrosh - change of email
On 10/22/2014 12:04 AM, James Bottomley wrote: On Tue, 2014-10-21 at 17:05 +0300, Boaz Harrosh wrote: Hi Sir Linus A small administrative stuff, was on vacation and the old email bounced on me. I was hoping to still make the 3 weeks merge window, but it was 2 weeks at the end. Your call if to make this wait for next window. Needless to say that it is ZERO risk, just change of email. Based on commit [bfe01a5b] Linux 3.17 3 patches available in the git repository at: git://git.open-osd.org/linux-open-osd.git for-linus for you to fetch changes up to 1fa3a002b2546c42c343c77c144871285896ced5: Boaz Harrosh - fix email in Documentation (2014-10-19 20:36:36 +0300) Boaz Harrosh (3): MAINTAINERS: Change Boaz Harrosh's email Boaz Harrosh - Fix broken email address Boaz Harrosh - fix email in Documentation Documentation/scsi/osd.txt | 3 +-- MAINTAINERS | 2 +- drivers/scsi/osd/Kbuild | 2 +- drivers/scsi/osd/Kconfig| 2 +- drivers/scsi/osd/osd_debug.h| 2 +- drivers/scsi/osd/osd_initiator.c| 4 ++-- drivers/scsi/osd/osd_uld.c | 4 ++-- fs/exofs/Kbuild | 2 +- fs/exofs/common.h | 2 +- fs/exofs/dir.c | 2 +- fs/exofs/exofs.h| 2 +- fs/exofs/file.c | 2 +- fs/exofs/inode.c| 2 +- fs/exofs/namei.c| 2 +- fs/exofs/ore.c | 4 ++-- fs/exofs/ore_raid.c | 2 +- fs/exofs/ore_raid.h | 2 +- fs/exofs/super.c| 2 +- fs/exofs/symlink.c | 2 +- fs/exofs/sys.c | 2 +- fs/nfs/objlayout/objio_osd.c| 2 +- fs/nfs/objlayout/objlayout.c| 2 +- fs/nfs/objlayout/objlayout.h| 2 +- fs/nfs/objlayout/pnfs_osd_xdr_cli.c | 2 +- include/linux/pnfs_osd_xdr.h| 2 +- include/scsi/osd_initiator.h| 2 +- include/scsi/osd_ore.h | 2 +- include/scsi/osd_protocol.h | 4 ++-- include/scsi/osd_sec.h | 2 +- include/scsi/osd_sense.h| 2 +- include/scsi/osd_types.h| 2 +- 31 files changed, 35 insertions(+), 36 deletions(-) This is a bit of an unnecessary massive churn. No one expects the author named in the file to stay up to date, especially because the @domain.com usually credits the company who paid for the code, so it's left in as a kind of mark of respect for them. I'm not saying it applies in your case, just that it creates the common expectation of in-file authors needing to be traced through the MAINTAINERS file. Could you not just update the MAINTAINERS file only, like everyone else? OK, I did not know this. The Panasas credit is still there so no harm done. Anyway Linus pulled it, so lets leave it at that for now. Next time I'll know. James Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] osd: Boaz Harrosh - change of email
Hi Sir Linus A small administrative stuff, was on vacation and the old email bounced on me. I was hoping to still make the 3 weeks merge window, but it was 2 weeks at the end. Your call if to make this wait for next window. Needless to say that it is ZERO risk, just change of email. Based on commit [bfe01a5b] Linux 3.17 3 patches available in the git repository at: git://git.open-osd.org/linux-open-osd.git for-linus for you to fetch changes up to 1fa3a002b2546c42c343c77c144871285896ced5: Boaz Harrosh - fix email in Documentation (2014-10-19 20:36:36 +0300) Boaz Harrosh (3): MAINTAINERS: Change Boaz Harrosh's email Boaz Harrosh - Fix broken email address Boaz Harrosh - fix email in Documentation Documentation/scsi/osd.txt | 3 +-- MAINTAINERS | 2 +- drivers/scsi/osd/Kbuild | 2 +- drivers/scsi/osd/Kconfig| 2 +- drivers/scsi/osd/osd_debug.h| 2 +- drivers/scsi/osd/osd_initiator.c| 4 ++-- drivers/scsi/osd/osd_uld.c | 4 ++-- fs/exofs/Kbuild | 2 +- fs/exofs/common.h | 2 +- fs/exofs/dir.c | 2 +- fs/exofs/exofs.h| 2 +- fs/exofs/file.c | 2 +- fs/exofs/inode.c| 2 +- fs/exofs/namei.c| 2 +- fs/exofs/ore.c | 4 ++-- fs/exofs/ore_raid.c | 2 +- fs/exofs/ore_raid.h | 2 +- fs/exofs/super.c| 2 +- fs/exofs/symlink.c | 2 +- fs/exofs/sys.c | 2 +- fs/nfs/objlayout/objio_osd.c| 2 +- fs/nfs/objlayout/objlayout.c| 2 +- fs/nfs/objlayout/objlayout.h| 2 +- fs/nfs/objlayout/pnfs_osd_xdr_cli.c | 2 +- include/linux/pnfs_osd_xdr.h| 2 +- include/scsi/osd_initiator.h| 2 +- include/scsi/osd_ore.h | 2 +- include/scsi/osd_protocol.h | 4 ++-- include/scsi/osd_sec.h | 2 +- include/scsi/osd_sense.h| 2 +- include/scsi/osd_types.h| 2 +- 31 files changed, 35 insertions(+), 36 deletions(-) Thanks, have a good Kernel Cycle Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] MAINTAINERS: Change Boaz Harrosh's email
From: Boaz Harrosh o...@electrozaur.com I have moved on, and do no longer have Panasas email access. Update to an email that can reach me. So change bharr...@panasas.com = o...@electrozaur.com Explain of email address: * electrozaur.com is a domain owned by me. * ooo - Stands for Open Osd . Org Another email alias that can be used is: open...@gmail.com CC: Greg KH gre...@linuxfoundation.org Signed-off-by: Boaz Harrosh o...@electrozaur.com --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index f10ed39..fc2c9a8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6725,7 +6725,7 @@ S:Orphan F: drivers/net/wireless/orinoco/ OSD LIBRARY and FILESYSTEM -M: Boaz Harrosh bharr...@panasas.com +M: Boaz Harrosh o...@electrozaur.com M: Benny Halevy bhal...@primarydata.com L: osd-...@open-osd.org W: http://open-osd.org -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Boaz Harrosh - Fix broken email address
From: Boaz Harrosh o...@electrozaur.com I no longer have access to the Panasas email. So change to an email that can always reach me. Signed-off-by: Boaz Harrosh o...@electrozaur.com --- drivers/scsi/osd/Kbuild | 2 +- drivers/scsi/osd/Kconfig| 2 +- drivers/scsi/osd/osd_debug.h| 2 +- drivers/scsi/osd/osd_initiator.c| 4 ++-- drivers/scsi/osd/osd_uld.c | 4 ++-- fs/exofs/Kbuild | 2 +- fs/exofs/common.h | 2 +- fs/exofs/dir.c | 2 +- fs/exofs/exofs.h| 2 +- fs/exofs/file.c | 2 +- fs/exofs/inode.c| 2 +- fs/exofs/namei.c| 2 +- fs/exofs/ore.c | 4 ++-- fs/exofs/ore_raid.c | 2 +- fs/exofs/ore_raid.h | 2 +- fs/exofs/super.c| 2 +- fs/exofs/symlink.c | 2 +- fs/exofs/sys.c | 2 +- fs/nfs/objlayout/objio_osd.c| 2 +- fs/nfs/objlayout/objlayout.c| 2 +- fs/nfs/objlayout/objlayout.h| 2 +- fs/nfs/objlayout/pnfs_osd_xdr_cli.c | 2 +- include/linux/pnfs_osd_xdr.h| 2 +- include/scsi/osd_initiator.h| 2 +- include/scsi/osd_ore.h | 2 +- include/scsi/osd_protocol.h | 4 ++-- include/scsi/osd_sec.h | 2 +- include/scsi/osd_sense.h| 2 +- include/scsi/osd_types.h| 2 +- 29 files changed, 33 insertions(+), 33 deletions(-) diff --git a/drivers/scsi/osd/Kbuild b/drivers/scsi/osd/Kbuild index 5fd73d77..58cecd4 100644 --- a/drivers/scsi/osd/Kbuild +++ b/drivers/scsi/osd/Kbuild @@ -4,7 +4,7 @@ # Copyright (C) 2008 Panasas Inc. All rights reserved. # # Authors: -# Boaz Harrosh bharr...@panasas.com +# Boaz Harrosh o...@electrozaur.com # Benny Halevy bhal...@panasas.com # # This program is free software; you can redistribute it and/or modify diff --git a/drivers/scsi/osd/Kconfig b/drivers/scsi/osd/Kconfig index a070351..347cc5e 100644 --- a/drivers/scsi/osd/Kconfig +++ b/drivers/scsi/osd/Kconfig @@ -4,7 +4,7 @@ # Copyright (C) 2008 Panasas Inc. All rights reserved. # # Authors: -# Boaz Harrosh bharr...@panasas.com +# Boaz Harrosh o...@electrozaur.com # Benny Halevy bhal...@panasas.com # # This program is free software; you can redistribute it and/or modify diff --git a/drivers/scsi/osd/osd_debug.h b/drivers/scsi/osd/osd_debug.h index 579e491..2634126 100644 --- a/drivers/scsi/osd/osd_debug.h +++ b/drivers/scsi/osd/osd_debug.h @@ -4,7 +4,7 @@ * Copyright (C) 2008 Panasas Inc. All rights reserved. * * Authors: - * Boaz Harrosh bharr...@panasas.com + * Boaz Harrosh o...@electrozaur.com * Benny Halevy bhal...@panasas.com * * This program is free software; you can redistribute it and/or modify diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c index 5f4cbf0..52e243b 100644 --- a/drivers/scsi/osd/osd_initiator.c +++ b/drivers/scsi/osd/osd_initiator.c @@ -7,7 +7,7 @@ * Copyright (C) 2008 Panasas Inc. All rights reserved. * * Authors: - * Boaz Harrosh bharr...@panasas.com + * Boaz Harrosh o...@electrozaur.com * Benny Halevy bhal...@panasas.com * * This program is free software; you can redistribute it and/or modify @@ -57,7 +57,7 @@ enum { OSD_REQ_RETRIES = 1 }; -MODULE_AUTHOR(Boaz Harrosh bharr...@panasas.com); +MODULE_AUTHOR(Boaz Harrosh o...@electrozaur.com); MODULE_DESCRIPTION(open-osd initiator library libosd.ko); MODULE_LICENSE(GPL); diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c index e1d9a4c..92cdd4b 100644 --- a/drivers/scsi/osd/osd_uld.c +++ b/drivers/scsi/osd/osd_uld.c @@ -10,7 +10,7 @@ * Copyright (C) 2008 Panasas Inc. All rights reserved. * * Authors: - * Boaz Harrosh bharr...@panasas.com + * Boaz Harrosh o...@electrozaur.com * Benny Halevy bhal...@panasas.com * * This program is free software; you can redistribute it and/or modify @@ -74,7 +74,7 @@ static const char osd_name[] = osd; static const char *osd_version_string = open-osd 0.2.1; -MODULE_AUTHOR(Boaz Harrosh bharr...@panasas.com); +MODULE_AUTHOR(Boaz Harrosh o...@electrozaur.com); MODULE_DESCRIPTION(open-osd Upper-Layer-Driver osd.ko); MODULE_LICENSE(GPL); MODULE_ALIAS_CHARDEV_MAJOR(SCSI_OSD_MAJOR); diff --git a/fs/exofs/Kbuild b/fs/exofs/Kbuild index 389ba83..b47c7b8 100644 --- a/fs/exofs/Kbuild +++ b/fs/exofs/Kbuild @@ -4,7 +4,7 @@ # Copyright (C) 2008 Panasas Inc. All rights reserved. # # Authors: -# Boaz Harrosh bharr...@panasas.com +# Boaz Harrosh o...@electrozaur.com # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 diff --git a/fs/exofs/common.h b/fs/exofs/common.h index 3bbd469..7d88ef5 100644 --- a/fs/exofs/common.h +++ b/fs/exofs/common.h @@ -4,7 +4,7 @@ * Copyright (C) 2005, 2006 * Avishay Traeger (avis
[PATCH 3/3] Boaz Harrosh - fix email in Documentation
From: Boaz Harrosh o...@electrozaur.com I forgot to update Documentation/*.txt Signed-off-by: Boaz Harrosh o...@electrozaur.com --- Documentation/scsi/osd.txt | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/Documentation/scsi/osd.txt b/Documentation/scsi/osd.txt index da162f7..5a9879b 100644 --- a/Documentation/scsi/osd.txt +++ b/Documentation/scsi/osd.txt @@ -184,8 +184,7 @@ Any problems, questions, bug reports, lonely OSD nights, please email: More up-to-date information can be found on: http://open-osd.org -Boaz Harrosh bharr...@panasas.com -Benny Halevy bhal...@panasas.com +Boaz Harrosh o...@electrozaur.com References == -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] libiscsi, iser: Adjust data_length to include protection information
On 08/13/2014 04:09 PM, Sagi Grimberg wrote: On 8/6/2014 4:25 PM, Boaz Harrosh wrote: Hey Boaz, So in the current flow, I still don't think it is wrong/buggy, the transfer byte length related to scsi buffer length (In iscsi for sure but I think that for all transports it is the same). But, Since you have such a strong objection on this I'm also OK with changing stuff... We can pass a buffer to scsi_transfer_length (although it has no meaning effectively) and we can keep in/out_len local variables and print them along with total transfer. Do you want me to send out a patch here (since I have some hardware to test it...) or are you still planning to send out something? I'll do it. As you said there is no bugs, just ugly code. I will send a cleanup Thanks Boaz Cheers, Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5] Save command pool address of Scsi_Host
On 08/04/2014 02:30 PM, jgr...@suse.com wrote: From: Juergen Gross jgr...@suse.com If a scsi host driver specifies .cmd_len in it's scsi_host_template, a driver's private command pool is needed. scsi_find_host_cmd_pool() will locate it, but scsi_alloc_host_cmd_pool() isn't saving the pool address in the host template. This will result in an access error when the host is removed. Avoid the problem by saving the address of a new allocated command pool where it is expected. Signed-off-by: Juergen Gross jgr...@suse.com --- drivers/scsi/scsi.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 88d46fe..b0cef5b 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -380,6 +380,10 @@ scsi_alloc_host_cmd_pool(struct Scsi_Host *shost) pool-slab_flags |= SLAB_CACHE_DMA; pool-gfp_mask = __GFP_DMA; } + + if (hostt-cmd_size) + hostt-cmd_pool = pool; + return pool; } @@ -424,8 +428,10 @@ out: out_free_slab: kmem_cache_destroy(pool-cmd_slab); out_free_pool: - if (hostt-cmd_size) + if (hostt-cmd_size) { scsi_free_host_cmd_pool(pool); I disagree I liked the V4 version better. It is much nicer on the review that we NULL the same one we pass in with no need for context that's outside of this scope Just my $0.017 Boaz + hostt-cmd_pool = NULL; + } goto out; } @@ -447,8 +453,10 @@ static void scsi_put_host_cmd_pool(struct Scsi_Host *shost) if (!--pool-users) { kmem_cache_destroy(pool-cmd_slab); kmem_cache_destroy(pool-sense_slab); - if (hostt-cmd_size) + if (hostt-cmd_size) { scsi_free_host_cmd_pool(pool); + hostt-cmd_pool = NULL; + } } mutex_unlock(host_cmd_pool_mutex); } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] brd.ko Never supported partitions should we remove the dead code ?
On 07/29/2014 08:19 PM, Ross Zwisler wrote: On Tue, 2014-07-29 at 19:37 +0300, Boaz Harrosh wrote: Hi folks I've been playing with yet unseen prd block device, and hit an issue with partitioning but since prd.c is a copy/paste of brd.c the same exact issue exists with brd. It does not support partitions! An attempt to fdisk say a couple of partitions, then mkfs mount an individual partition will result in the primary device being accessed and a corrupted disk data. If I enable max_part(=2) properly on modprobe, then fdisk a couple of partitions all goes well But then if I pass the device to Kernel (Say via mount), after a call to bdev = blkdev_get_by_path(dev_name, mode, fs_type); I get the following results: (size is extracted using:i_size_read(bdev-bd_inode); part_size is extracted using: bdev-bd_part-nr_sects 9; ) dev_name == /dev/ram0 [Jul29 17:40] foo: [foo_mount:880] size=0x4000 bdev=88003dc24340 bd_inode=88003dc24430 bd_part=88003ca22848 part_size=0x4000 dev_name == /dev/ram0p1 [ +16.816065] foo: [foo_mount:880] size=0x4000 bdev=88003d2f6d80 bd_inode=88003d2f6e70 bd_part=88003ca22848 part_size=0x4000 dev_name == /dev/ram0p2 [Jul29 17:41] foo: [foo_mount:880] size=0x4000 bdev=88003dc24680 bd_inode=88003dc24770 bd_part=88003ca22848 part_size=0x4000 We can see that all 3 different bdev's point to the same bd_part, which is the BUG. Funny is that bd_inode is different but contains same wrong size, for exactly the same reason. Yep, I've seen this as well. It turns out that partitions *do* work in brd if you don't specify the 'rd_nr' module parameter, but if you do specify the module parameter you get the partition overlapping behavior that you observed. AFAICT the difference between these two scenarios is that when you set rd_nr, the 'range' variable is set so something small, but when you don't set it 'range' is 1,048,576. That's done by this bit of code in brd_init: if (rd_nr) { nr = rd_nr; range = rd_nr part_shift; } else { nr = CONFIG_BLK_DEV_RAM_COUNT; range = 1UL MINORBITS; } The difference in behavior happens up the stack somewhere as a result of you passing 'range' into blk_register_region(). Anyway, the quick and dirty workaround is to always have 'range' be something large. This is what I've done in the current master branch of the upstream PRD tree - see commit 0256739ccab03d1662221f595b03797370c2ac12. For what it's worth, I'm planning on updating prd (and probably brd) so that it dynamically allocates partition numbers. See commit 469071a37afc8a627b6b2ddf29db0a097d864845 for how this was done in the nvme driver. Hi Ross I've looked at 469071a37afc8a627b6b2ddf29db0a097d864845, but surly as you said it is not the fix. I have posted the fix, in the code you snipped out. So I get it that you guys are not convinced that we need to remove support for Partitions for brd? For me a partition for /dev/ramX is totally bogus. What's the point at all? Since we have the same effect if we just do rd_nr and partition memory this way, and save 1M of memory. And it is not as if you can loose power and want the partitioning persistent for the next boot. But sigh I will not fight you about it, I made my point of this being pointless and that is that. I will post a proper patch for brd on top of Jens's tree unless you want it rebased over some other tree, I guess it can just be queued for 3.17. - Ross Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] brd: Fix the partitions BUG
With current code after a call to: bdev = blkdev_get_by_path(dev_name, mode, fs_type); size = i_size_read(bdev-bd_inode); part_size = bdev-bd_part-nr_sects 9; I get the following bad results: dev_name == /dev/ram0 foo: [foo_mount:880] size=0x4000 bdev=88003dc24340 \ bd_inode=88003dc24430 bd_part=88003ca22848 part_size=0x4000 dev_name == /dev/ram0p1 foo: [foo_mount:880] size=0x4000 bdev=88003d2f6d80 \ bd_inode=88003d2f6e70 bd_part=88003ca22848 part_size=0x4000 dev_name == /dev/ram0p2 foo: [foo_mount:880] size=0x4000 bdev=88003dc24680 \ bd_inode=88003dc24770 bd_part=88003ca22848 part_size=0x4000 Note how all three bdev(s) point to the same bd_part. This is do to a single bad clubber in brd_probe() which is removed in this patch: - *part = 0; because of this all 3 bdev(s) above get to point to the same bd_part[0] While at it fix/rename brd_init_one() since all devices are created on load of driver, brd_probe() will never be called with a new un-created device. brd_init_one() is now renamed to brd_find() which is what it does. TODO: There is one more partitions BUG regarding brd_direct_access() which is fixed on the next patch. Signed-off-by: Boaz Harrosh b...@plexistor.com --- drivers/block/brd.c | 19 --- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/drivers/block/brd.c b/drivers/block/brd.c index c7d138e..92334f6 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -523,22 +523,20 @@ static void brd_free(struct brd_device *brd) kfree(brd); } -static struct brd_device *brd_init_one(int i) +static struct brd_device *brd_find(int i) { struct brd_device *brd; list_for_each_entry(brd, brd_devices, brd_list) { if (brd-brd_number == i) - goto out; + return brd; } - brd = brd_alloc(i); - if (brd) { - add_disk(brd-brd_disk); - list_add_tail(brd-brd_list, brd_devices); - } -out: - return brd; + /* brd always allocates all its devices at load time, therefor +* brd_probe will never be called with a new brd_number +*/ + printk(KERN_EROR brd: brd_find unexpected device %d\n, i); + return NULL; } static void brd_del_one(struct brd_device *brd) @@ -554,11 +552,10 @@ static struct kobject *brd_probe(dev_t dev, int *part, void *data) struct kobject *kobj; mutex_lock(brd_devices_mutex); - brd = brd_init_one(MINOR(dev) part_shift); + brd = brd_find(MINOR(dev) part_shift); kobj = brd ? get_disk(brd-brd_disk) : NULL; mutex_unlock(brd_devices_mutex); - *part = 0; return kobj; } -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] brd: Fix brd_direct_access with partitions
When brd_direct_access() is called on a partition-bdev it would access the wrong sector. And caller would then corrupt the device's data. This is a preliminary fix, Matthew Wilcox has a patch in his DAX patchset which will define a global wrapper to bdev-bd_disk-fops-direct_access that will do the proper checks and translations before calling a driver global member. (The way it is done at the rest of the block stack) CC: Matthew Wilcox matthew.r.wil...@intel.com Signed-off-by: Boaz Harrosh b...@plexistor.com --- drivers/block/brd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/block/brd.c b/drivers/block/brd.c index 92334f6..7506864 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -378,9 +378,10 @@ static int brd_direct_access(struct block_device *bdev, sector_t sector, if (!brd) return -ENODEV; + sector += get_start_sect(bdev); if (sector (PAGE_SECTORS-1)) return -EINVAL; - if (sector + PAGE_SECTORS get_capacity(bdev-bd_disk)) + if (unlikely(sector + PAGE_SECTORS part_nr_sects_read(bdev-bd_part))) return -ERANGE; page = brd_insert_page(brd, sector); if (!page) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] brd: Fix brd_direct_access with partitions
On 07/30/2014 06:34 PM, Matthew Wilcox wrote: On Wed, Jul 30, 2014 at 05:18:47PM +0300, Boaz Harrosh wrote: When brd_direct_access() is called on a partition-bdev it would access the wrong sector. And caller would then corrupt the device's data. This is a preliminary fix, Matthew Wilcox has a patch in his DAX patchset which will define a global wrapper to bdev-bd_disk-fops-direct_access that will do the proper checks and translations before calling a driver global member. (The way it is done at the rest of the block stack) Uh, no, let's just focus on getting the DAX code in instead of putting in interim patches that will conflict. Patch 4/22 is uncontroversial, fixes this problem, has no dependencies, and is key to the rest of the DAX patchset. If this problem wants to be fixed, then put 4/22 in instead. OK, I agree, could you please push 4/22 as reply to here for Jens to take for 3.17 ? (together with my 1/2) Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] brd: Fix the partitions BUG
On 07/30/2014 07:50 PM, Ross Zwisler wrote: + */ +printk(KERN_EROR brd: brd_find unexpected device %d\n, i); s/KERN_EROR/KERN_ERR/ Yes thanks, sigh, code should compile driver error. I used pr_err but last inspection I saw that printk is used everywhere and, crapped ... +return NULL; } static void brd_del_one(struct brd_device *brd) @@ -554,11 +552,10 @@ static struct kobject *brd_probe(dev_t dev, int *part, void *data) struct kobject *kobj; mutex_lock(brd_devices_mutex); -brd = brd_init_one(MINOR(dev) part_shift); +brd = brd_find(MINOR(dev) part_shift); kobj = brd ? get_disk(brd-brd_disk) : NULL; mutex_unlock(brd_devices_mutex); -*part = 0; return kobj; } It is possible to create new block devices with BRD at runtime: # mknod /dev/new_brd b 1 4 # fdisk -l /dev/new_brd This causes a new BRD disk to be created, and hits your error case: Jul 30 10:40:57 alara kernel: brd: brd_find unexpected device 4 Ha OK, So this is the mystery key. I never knew that trick, OK I guess I need to leave this in I guess in general I'm not saying that BRD needs to have partitions - indeed it may not give you much in the way of added functionality. As the code currently stands partitions aren't surfaced anyway (GENHD_FL_SUPPRESS_PARTITION_INFO is set). For PRD, however, I *do* want to enable partitions correctly because eventually I'd like to enhance PRD so that it *does* actually handle NVDIMMs correctly, and for that partitions do make sense. So lets talk about that for a bit. Why would you want legacy partitions for NVDIMMs. For one fdisk will waste 1M of memory on this. And with NVDIMMs you actually need a different manager all together, more like lvm stuff. I have patches here that changes prd a lot. - Global memory parameters are gone each device remaps and operates on it's own memory range. - the prd_params are gone instead you have one memmap= parameter of the form memmap=nnn$ooo,[nnn$ooo]... where: nnn - is size in bytes of the form number[K/M/G] ooo - is offset in bytes of the form number[K/M/G] Very much the same as the memmap= Kernel parameters. So just copy/paste what you did there and it will work. Now each memory range has one prd_device created for it. Note that if specified ranges are just contiguous then it is like your prd_count thing where you took a contiguous range and sliced it up, only here they can be of different size. Off course it has an added fixture of dis-contiguous ranges like we have in our multy-nodes NUMA system in the lab that host DDR3 NvDIMMs in two banks. - A sysfs interface is added to add/remove memory range on the fly like osdblk - If no parameters are specified at all, the Kernel command-line is parsed and all memmap= sections are attempted and used if are not already claimed. An interface like mknod above is not supported since it is actually pointless. My current code still supports partitions but I still think it is silly. [ This is all speculative for DDR3-NVDIMMs, we have seen that the memory controller actually tags these DIMMs with type 12 but it looks like this is all vendor specific right now and I understand that DDR4 standardize all that. So I was hoping you guys are working on all that with the NvDIMM stuff. Now lets say that I have established auto-detection for each DIMM and have extracted its SN (Our DDR3 DIMMs each have an SN that can be displayed by the vendor supplied tool) An NvDIMM manager will need to establish an NvDIMM table and set order. The motivation of a partition table is that a disk moved to another machine and/or booted into another OS will have persistent and STD way to not clobber over established DATA. But here we have like a disk cluster/raid, an ordered set of drives. Unique to NvDIMMs is the interleaving. If pairs are then inserted in wrong order or are re-mixed. This should be detected and refused to be mounted (unless a re-initialize is forced) So data is not silently lost on operator errors. All this is then persisted on some DIMM, first or last. If code is able to re-arrange order, like when physical address have changed it will, but in the wrong interleaving case it will refuse to mount. ] And if I have to implement and debug partitions for PRD, it's easy to stick them in BRD in case anyone wants to use them. I was finally playing with BRD, and it is missing your getgeo patch, so it is currently completely alien to fdisk and partitions. So why, why keep a fixture that never existed, I still hope to convince you that this is crap. Specially with brd that can have devices added on the fly like you showed above. Who needs them at all? Why waist 1M of memory on each device for no extra gain? Specially in light of my new prd that does away of any needs of a partitioning since it supports arbitrary slicing in another way. -
[RFD] brd.ko Never supported partitions should we remove the dead code ?
Hi folks I've been playing with yet unseen prd block device, and hit an issue with partitioning but since prd.c is a copy/paste of brd.c the same exact issue exists with brd. It does not support partitions! An attempt to fdisk say a couple of partitions, then mkfs mount an individual partition will result in the primary device being accessed and a corrupted disk data. If I enable max_part(=2) properly on modprobe, then fdisk a couple of partitions all goes well But then if I pass the device to Kernel (Say via mount), after a call to bdev = blkdev_get_by_path(dev_name, mode, fs_type); I get the following results: (size is extracted using: i_size_read(bdev-bd_inode); part_size is extracted using: bdev-bd_part-nr_sects 9; ) dev_name == /dev/ram0 [Jul29 17:40] foo: [foo_mount:880] size=0x4000 bdev=88003dc24340 bd_inode=88003dc24430 bd_part=88003ca22848 part_size=0x4000 dev_name == /dev/ram0p1 [ +16.816065] foo: [foo_mount:880] size=0x4000 bdev=88003d2f6d80 bd_inode=88003d2f6e70 bd_part=88003ca22848 part_size=0x4000 dev_name == /dev/ram0p2 [Jul29 17:41] foo: [foo_mount:880] size=0x4000 bdev=88003dc24680 bd_inode=88003dc24770 bd_part=88003ca22848 part_size=0x4000 We can see that all 3 different bdev's point to the same bd_part, which is the BUG. Funny is that bd_inode is different but contains same wrong size, for exactly the same reason. The fix for this is easy: diff --git drivers/block/brd.c drivers/block/brd.c index c7d138e..0d982d7 100644 --- drivers/block/brd.c +++ drivers/block/brd.c @@ -378,10 +378,12 @@ static int brd_direct_access(struct block_device *bdev, sector_t sector, if (!brd) return -ENODEV; + sector += get_start_sect(bdev); if (sector (PAGE_SECTORS-1)) return -EINVAL; +/* Check is wrong here we need to check against bdev-bd_part-nr_sects */ if (sector + PAGE_SECTORS get_capacity(bdev-bd_disk)) return -ERANGE; page = brd_insert_page(brd, sector); if (!page) return -ENOSPC; @@ -532,11 +532,17 @@ static struct brd_device *brd_init_one(int i) goto out; } +/* In the structure of this driver this will never happen and is wrong + * to do. We should just return NULL if not found. + * This is not the bug it is just DEAD CODE + * brd = brd_alloc(i); if (brd) { add_disk(brd-brd_disk); list_add_tail(brd-brd_list, brd_devices); } +*/ + prd = NULL; out: return brd; } @@ -558,7 +564,8 @@ static struct kobject *brd_probe(dev_t dev, int *part, void *data) kobj = brd ? get_disk(brd-brd_disk) : NULL; mutex_unlock(brd_devices_mutex); - *part = 0; +// Fix the partition BUG *part comes in correctly must not need to touch it +// *part = 0; return kobj; } --- After this simple fix of commenting out *part = 0, running again I get dev_name == /dev/ram0 [ +0.04] foo: [foo_mount:880] size=0x4000 bdev=88003d2f7740 inode=88003d2f7830 part=88001da8c048 part_size=0x4000 dev_name == /dev/ram0p1 [ +0.02] foo: [foo_mount:880] size=0x1e748200 bdev=88003dc25040 inode=88003dc25130 part=880036f6a000 part_size=0x1e748200 dev_name == /dev/ram0p2 [ +0.02] foo: [foo_mount:880] size=0x217b7e00 bdev=88003d2f7a80 inode=88003d2f7b70 part=880036f69000 part_size=0x217b7e00 But before we are running to fix this bug. Could we please do better and just remove the support for partitions all together. Since it *never* worked anyway, so probably no one needs it! Surly no one used it Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] brd.ko Never supported partitions should we remove the dead code ?
On 07/29/2014 07:56 PM, Matthew Wilcox wrote: On Tue, Jul 29, 2014 at 07:37:49PM +0300, Boaz Harrosh wrote: But before we are running to fix this bug. Could we please do better and just remove the support for partitions all together. Since it *never* worked anyway, so probably no one needs it! Surly no one used it I fixed this in patch 4/22 of the v8 series. The correct place to handle partitioning is in the block layer, not the individual drivers (nor the filesystems). Sir Mathew hi @@ -378,10 +378,12 @@ static int brd_direct_access(struct block_device *bdev, sector_t sector, if (!brd) return -ENODEV; + sector += get_start_sect(bdev); if (sector (PAGE_SECTORS-1)) return -EINVAL; +/* Check is wrong here we need to check against bdev-bd_part-nr_sects */ if (sector + PAGE_SECTORS get_capacity(bdev-bd_disk)) return -ERANGE; page = brd_insert_page(brd, sector); if (!page) return -ENOSPC; you mean you fixed the brd_direct_access() hunk by pointing all users to a wrapper that does the proper offsetting, sure. But that is not the main bug I was talking about, the main BUG is that partitions are not supported at all because of the clubber of *part in brd_probe() @@ -558,7 +564,8 @@ static struct kobject *brd_probe(dev_t dev, int *part, void *data) kobj = brd ? get_disk(brd-brd_disk) : NULL; mutex_unlock(brd_devices_mutex); - *part = 0; +// Fix the partition BUG *part comes in correctly must not need to touch it +// *part = 0; return kobj; } And my point is that: No one uses partitions with brd, why should we not remove it all together, why fix it and keep it? Cheers Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/3] scsi_cmnd: Introduce scsi_transfer_length helper
On 06/25/2014 04:17 AM, Vladislav Bolkhovitin wrote: Martin K. Petersen, on 06/23/2014 06:58 PM wrote: Mike == Mike Christie micha...@cs.wisc.edu writes: + unsigned int xfer_len = blk_rq_bytes(scmd-request); Mike Can you do bidi and dif/dix? Nope. Correction: at the moment. There is a proposal of READ GATHERED command, which is bidirectional and potentially DIF/DIX. That's all very good. But current infrastructure can not support BIDI+DIFF. Because you we'll need *two* diff vectors one for each side. Now in fact at the block layer this is actually supported. Since BIDI is two requests, and each can have its own DIFF bio. But on the scsi layer this is not implemented. So I'd say: *For now DIFF and BIDI are mutually exclusive.* (You'll need someone to love these new commands a lot in order to enable it) Vlad Cheers Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/3] scsi_cmnd: Introduce scsi_transfer_length helper
On 06/25/2014 01:32 PM, Sagi Grimberg wrote: On 6/25/2014 11:48 AM, Sagi Grimberg wrote: SNIP I made the patch below which should fix both bidi support in iscsi and also WRITE_SAME (and similar commands) support. I'm a bit confused, for all commands (bidi or not) the iscsi header data_length should describe the scsi_data_buffer length, bidi information will lie in AHS header. (in case bidi will ever co-exist with PI, we might need another helper that will look in req-special + PI, something like scsi_bidi_transfer_length) So why not keep scsi_transfer_length to work on sdb length (take scsi_bufflen(scmnd) or scsi_out(scmnd)-length as MKP suggested) and that's it - don't touch libiscsi. Let me test that. So I tested a bidirectional command using: $ sg_raw --infile=/root/filex --send=1024 --request=1024 --outfile=/root/filex /dev/bsg/7:0:0:0 53 00 00 00 00 00 00 00 02 00 And I see: kernel: session1: iscsi_prep_scsi_cmd_pdu iscsi prep [bidirectional cid 0 sc 880468ca1e00 cdb 0x53 itt 0x16 len 1024 bidi_len 1024 cmdsn 223 win 64] This is a very bad example and tested nothing, since len bidi_len are the same. So even if you had a bug and took length from the wrong side it would come out the same. You must test with a bidi command that has two different lengths for each side If you have a tree that you want me to test I will be glad too. From this thread I'm confused as to what patches you want me to test? please point me to a tree you need testing. You can bug me any time, any tree. I will be happy to test. Cheers Boaz This confirms what I wrote above, so AFAICT no additional fix is required for libiscsi wrt bidi commands support. Mike? Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] libiscsi, iser: Adjust data_length to include protection information
On 06/11/2014 12:09 PM, Sagi Grimberg wrote: In case protection information exists over the wire iscsi header data length is required to include it. Use protection information aware scsi helpers to set the correct transfer length. In order to avoid breakage, remove iser transfer length checks for each task as they are not always true and somewhat redundant anyway. Signed-off-by: Sagi Grimberg sa...@mellanox.com Reviewed-by: Mike Christie micha...@cs.wisc.edu --- drivers/infiniband/ulp/iser/iser_initiator.c | 34 +++-- drivers/scsi/libiscsi.c | 18 +++--- 2 files changed, 19 insertions(+), 33 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 2e2d903..8d44a40 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -41,11 +41,11 @@ #include iscsi_iser.h /* Register user buffer memory and initialize passive rdma - * dto descriptor. Total data size is stored in - * iser_task-data[ISER_DIR_IN].data_len + * dto descriptor. Data size is stored in + * task-data[ISER_DIR_IN].data_len, Protection size + * os stored in task-prot[ISER_DIR_IN].data_len */ -static int iser_prepare_read_cmd(struct iscsi_task *task, - unsigned int edtl) +static int iser_prepare_read_cmd(struct iscsi_task *task) { struct iscsi_iser_task *iser_task = task-dd_data; @@ -73,14 +73,6 @@ static int iser_prepare_read_cmd(struct iscsi_task *task, return err; } - if (edtl iser_task-data[ISER_DIR_IN].data_len) { - iser_err(Total data length: %ld, less than EDTL: - %d, in READ cmd BHS itt: %d, conn: 0x%p\n, - iser_task-data[ISER_DIR_IN].data_len, edtl, - task-itt, iser_task-ib_conn); - return -EINVAL; - } - err = device-iser_reg_rdma_mem(iser_task, ISER_DIR_IN); if (err) { iser_err(Failed to set up Data-IN RDMA\n); @@ -100,8 +92,9 @@ static int iser_prepare_read_cmd(struct iscsi_task *task, } /* Register user buffer memory and initialize passive rdma - * dto descriptor. Total data size is stored in - * task-data[ISER_DIR_OUT].data_len + * dto descriptor. Data size is stored in + * task-data[ISER_DIR_OUT].data_len, Protection size + * is stored at task-prot[ISER_DIR_OUT].data_len */ static int iser_prepare_write_cmd(struct iscsi_task *task, @@ -135,14 +128,6 @@ iser_prepare_write_cmd(struct iscsi_task *task, return err; } - if (edtl iser_task-data[ISER_DIR_OUT].data_len) { - iser_err(Total data length: %ld, less than EDTL: %d, - in WRITE cmd BHS itt: %d, conn: 0x%p\n, - iser_task-data[ISER_DIR_OUT].data_len, - edtl, task-itt, task-conn); - return -EINVAL; - } - err = device-iser_reg_rdma_mem(iser_task, ISER_DIR_OUT); if (err != 0) { iser_err(Failed to register write cmd RDMA mem\n); @@ -417,11 +402,12 @@ int iser_send_command(struct iscsi_conn *conn, if (scsi_prot_sg_count(sc)) { prot_buf-buf = scsi_prot_sglist(sc); prot_buf-size = scsi_prot_sg_count(sc); - prot_buf-data_len = sc-prot_sdb-length; + prot_buf-data_len = data_buf-data_len + ilog2(sc-device-sector_size) * 8; } if (hdr-flags ISCSI_FLAG_CMD_READ) { - err = iser_prepare_read_cmd(task, edtl); + err = iser_prepare_read_cmd(task); if (err) goto send_command_error; } diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 26dc005..3f46234 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -338,7 +338,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task) struct iscsi_session *session = conn-session; struct scsi_cmnd *sc = task-sc; struct iscsi_scsi_req *hdr; - unsigned hdrlength, cmd_len; + unsigned hdrlength, cmd_len, transfer_length; I hate that you introduced this new transfer_length variable. It does not exist. In BIDI supporting driver there is out_len and in_len just as original code. itt_t itt; int rc; @@ -391,11 +391,11 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task) if (scsi_get_prot_op(sc) != SCSI_PROT_NORMAL) task-protected = true; + transfer_length = scsi_transfer_length(sc); + hdr-data_length = cpu_to_be32(transfer_length); if (sc-sc_data_direction == DMA_TO_DEVICE) { - unsigned out_len = scsi_out(sc)-length; In light of all the comments and the obvious bugs, please just re do this patch. Do not just
Re: [PATCH 2/2] block: support 16 byte CDBs for SG_IO
On 07/20/2014 01:23 PM, Christoph Hellwig wrote: Signed-off-by: Christoph Hellwig h...@lst.de Hi Christoph I've quickly reviewed your code and have a few questions --- block/scsi_ioctl.c | 24 +--- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index c4e6633..a804f3e 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -288,8 +288,6 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk, if (hdr-interface_id != 'S') return -EINVAL; - if (hdr-cmd_len BLK_MAX_CDB) - return -EINVAL; if (hdr-dxfer_len (queue_max_hw_sectors(q) 9)) return -EIO; @@ -306,14 +304,21 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk, break; } + ret = -ENOMEM; rq = blk_get_request(q, writing ? WRITE : READ, GFP_KERNEL); if (!rq) - return -ENOMEM; + goto out; blk_rq_set_block_pc(rq); + if (hdr-cmd_len BLK_MAX_CDB) { + rq-cmd = kzalloc(hdr-cmd_len, GFP_KERNEL); So two things here: - hdr-cmd_len is char so can be MAX of 255. I understand that 4 bytes alignment is a SCSI thing so you found no point of checking any max size? - Why the zero alloc, if you are going to paste over it the exact same length. Now if like in scsi you need 4 bytes alignment and zero padding yes, but here you do not do this (and probably shouldn't). Hence why zero-alloc? - Looking at sg.h where hdr-cmd_len is defined it stills has a comment of = 16 you might want to remove the comment by now. + if (!rq-cmd) + goto out_put_request; + } + ret = -EFAULT; if (blk_fill_sghdr_rq(q, rq, hdr, mode)) Inside here: blk_fill_sghdr_rq() calls blk_verify_command() which does: (Below cmd[] is the buffer copied from user) /* Anybody who can open the device can do a read-safe command */ if (test_bit(cmd[0], filter-read_ok)) return 0; /* Write-safe commands require a writable open */ if (test_bit(cmd[0], filter-write_ok) has_write_perm) return 0; Now I am not sure what type Commands you guys intend for these large commands but if they are say SCSI-VARLEN this test will not work. Also a user might fool the system with pretending to be read a device modifying command. I would pass len to this test function and only permit these above if command is = 16. Any special large command is root only. Thanks Boaz - goto out; + goto out_free_cdb; if (hdr-iovec_count) { size_t iov_data_len; @@ -323,7 +328,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk, 0, NULL, iov); if (ret 0) { kfree(iov); - goto out; + goto out_free_cdb; } iov_data_len = ret; @@ -346,7 +351,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk, GFP_KERNEL); if (ret) - goto out; + goto out_free_cdb; bio = rq-bio; memset(sense, 0, sizeof(sense)); @@ -365,8 +370,13 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk, hdr-duration = jiffies_to_msecs(jiffies - start_time); ret = blk_complete_sghdr_rq(rq, hdr, bio); -out: + +out_free_cdb: + if (rq-cmd != rq-__cmd) + kfree(rq-cmd); +out_put_request: blk_put_request(rq); +out: return ret; } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] block: support 16 byte CDBs for SG_IO
On 07/20/2014 04:27 PM, Christoph Hellwig wrote: On Sun, Jul 20, 2014 at 02:47:49PM +0300, Boaz Harrosh wrote: So two things here: - hdr-cmd_len is char so can be MAX of 255. I understand that 4 bytes alignment is a SCSI thing so you found no point of checking any max size? I don't see any point to force the aligmnet - the devices need to reject garbage commands, and if for some reason we'd see future commands that are 252 and 255 we are prepared to handle them. I agree - Why the zero alloc, if you are going to paste over it the exact same length. Now if like in scsi you need 4 bytes alignment and zero padding yes, but here you do not do this (and probably shouldn't). Hence why zero-alloc? No good reason except that's what sg and bsg do. Ha sorry didn't look there. Looks redundant to me that's all Inside here: blk_fill_sghdr_rq() calls blk_verify_command() which does: (Below cmd[] is the buffer copied from user) /* Anybody who can open the device can do a read-safe command */ if (test_bit(cmd[0], filter-read_ok)) return 0; /* Write-safe commands require a writable open */ if (test_bit(cmd[0], filter-write_ok) has_write_perm) return 0; Now I am not sure what type Commands you guys intend for these large commands but if they are say SCSI-VARLEN this test will not work. Also a user might fool the system with pretending to be read a device modifying command. I would pass len to this test function and only permit these above if command is = 16. Any special large command is root only. Honestly that whole filter crap should never have been merged to start with, you'll just need proper CAP_SYS_RAWIO for variable length commands. I agree. What I'm saying is - Are you sure that with current code as is we will not pass on large commands without the proper CAP_SYS_RAWIO. Should we not make sure and add: /* root can do any command. */ if (capable(CAP_SYS_RAWIO)) return 0; + + if (cmnd_len BLK_MAX_CDB) + return -EPERM; Rrrr you are right. I finally get the filter code. Anything that is not white-listed is rejected. OK sorry for the noise. Reviewed-by: Boaz Harrosh b...@plexistor.com Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] sg: add SG_FLAG_Q_AT_TAIL flag
On 06/04/2014 05:58 PM, Douglas Gilbert wrote: When the SG_IO ioctl was copied into the block layer and later into the bsg driver, subtle differences emerged. One difference is the way injected commands are queued through the block layer (i.e. this is not SCSI device queueing nor SATA NCQ). Summarizing: - SG_IO in the block layer: blk_exec*(at_head=false) - sg SG_IO: at_head=true - bsg SG_IO: at_head=true Some time ago Boaz Harrosh introduced a sg v4 flag called BSG_FLAG_Q_AT_TAIL to override the bsg driver default. This patch does the equivalent for the sg driver. Yep ChangeLog: Introduce SG_FLAG_Q_AT_TAIL flag to cause commands to be injected into the block layer with at_head=false. Changes since v1: Make guard condition (only take sg v3 interface or later invocations) clearer. Signed-off-by: Douglas Gilbert dgilb...@interlog.com Douglas Hi Please I'm just curious? up until now all application users used SG_FLAG_Q_AT_HEAD. Therefor I deduce that you guys came across a new application that can make use of the new SG_FLAG_Q_AT_TAIL facility. What was the application's writer considerations for using the old sg interface and not use the newer bsg interface that already has this support. For me I can see 2 main areas that I find bsg missing. 1. aio scatter_gather type io. (ie multiple pointers multiple length buffers that are written / read from same linear range on device) [The async aspect of aio can be implemented via bsg with the write+read system calls] 2. mmap of direct device range to user vm memory Which of these, or other, considerations where cardinal in using of the old interface in new code? (For me personally both above shortcomings are very much missed] Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5] sg: relax 16 byte cdb restriction
On 06/05/2014 06:40 PM, Boaz Harrosh wrote: On 06/03/2014 08:18 PM, Douglas Gilbert wrote: v4 of this patch was sent 20131201. ChangeLog: - remove the 16 byte CDB (SCSI command) length limit from the sg driver by handling longer CDBs the same way as the bsg driver. Remove comment from sg.h public interface about the cmd_len field being limited to 16 bytes. - remove some dead code caused by this change - cleanup comment block at the top of sg.h, fix urls Signed-off-by: Douglas Gilbert dgilb...@interlog.com sg_cdb_dg5.patch +/* SG_MAX_CDB_SIZE should be 260 (spc4r37 section 3.1.30) however the type + * of sg_io_hdr::cmd_len can only represent 255 + */ +#define SG_MAX_CDB_SIZE 255 + BTW here too. Why new code is using the old interface? I thought that the all point for having bsg at all is that new stuff are implemented there in a cleaner interface for example large CDBs. Otherwise what was the point for bsg in the first place? (It was before my time, It would be nice to know?) What is then left unique at bsg after this patch? only bidi? Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] scsi patch queue tree
On 06/02/2014 10:58 AM, Christoph Hellwig wrote: With Linus opening the merge window things should calm down now. I've pushed the mvsas pciid update, and a revert of the be2scsi patch that wasn't quite ready to the drivers-for-3.16 branch. I've also created a new drivers-for-3.16-2 branch that contains the remaining hpsa updates that might be too big for Linus taste that late but should probably be fine. I also expect to add the relatively small lpfc update to it once I get second review in proper form. Hi Christoph Could you please add this very trivial patch for this merge window? http://www.spinics.net/lists/linux-scsi/msg74774.html my hack here: http://www.spinics.net/lists/linux-scsi/msg74775.html Forgot to CC you sorry Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] scsi patch queue tree
Looks more like an ack than a hack :) Oops so I do need that coffee ;0) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] include/scsi/osd_protocol.h: remove unnecessary __constant
On 06/01/2014 05:06 PM, Fabian Frederick wrote: __constant_cpu_to_be16 converted to cpu_to_be16 This patch fixes checkpatch warnings: WARNING: __constant_cpu_to_be16 should be cpu_to_be16 Cc: Boaz Harrosh bharr...@panasas.com Cc: Benny Halevy bhal...@primarydata.com Signed-off-by: Fabian Frederick f...@skynet.be Ack-by: Boaz Harrosh bharr...@panasas.com James please apply Thanks Boaz --- include/scsi/osd_protocol.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/include/scsi/osd_protocol.h b/include/scsi/osd_protocol.h index 25ac628..a2594af 100644 --- a/include/scsi/osd_protocol.h +++ b/include/scsi/osd_protocol.h @@ -263,16 +263,16 @@ static inline struct osd_cdb_head *osd_cdb_head(struct osd_cdb *ocdb) * Ex name = FORMAT_OSD we have OSD_ACT_FORMAT_OSD OSDv1_ACT_FORMAT_OSD */ #define OSD_ACT___(Name, Num) \ - OSD_ACT_##Name = __constant_cpu_to_be16(0x8880 + Num), \ - OSDv1_ACT_##Name = __constant_cpu_to_be16(0x8800 + Num), + OSD_ACT_##Name = cpu_to_be16(0x8880 + Num), \ + OSDv1_ACT_##Name = cpu_to_be16(0x8800 + Num), /* V2 only actions */ #define OSD_ACT_V2(Name, Num) \ - OSD_ACT_##Name = __constant_cpu_to_be16(0x8880 + Num), + OSD_ACT_##Name = cpu_to_be16(0x8880 + Num), #define OSD_ACT_V1_V2(Name, Num1, Num2) \ - OSD_ACT_##Name = __constant_cpu_to_be16(Num2), \ - OSDv1_ACT_##Name = __constant_cpu_to_be16(Num1), + OSD_ACT_##Name = cpu_to_be16(Num2), \ + OSDv1_ACT_##Name = cpu_to_be16(Num1), enum osd_service_actions { OSD_ACT_V2(OBJECT_STRUCTURE_CHECK, 0x00) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi: lpfc: lpfc_init: use kcalloc for allocating memory
On 03/18/2014 10:51 PM, Matei Oprea wrote: It's easier to use kcalloc for allocating arrays. While at it also remove useless casting value. Signed-off-by: Matei Oprea e...@opreamatei.ro Cc: ROSEdu Kernel Community fire...@lists.rosedu.org --- drivers/scsi/lpfc/lpfc_init.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 68c94cc..0a51ca5 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -4731,9 +4731,9 @@ lpfc_sli_driver_resource_setup(struct lpfc_hba *phba) } if (!phba-sli.ring) - phba-sli.ring = (struct lpfc_sli_ring *) - kzalloc(LPFC_SLI3_MAX_RING * + phba-sli.ring = kcalloc(LPFC_SLI3_MAX_RING, sizeof(struct lpfc_sli_ring), GFP_KERNEL); + Just a nit please put this recheck inside the if(){} so in the hot path when phba-sli.ring is already allocated it will not test twice. if (!phba-sli.ring) and unlikely() on this if return -ENOMEM; thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: misc scsi midlayer updates
On 03/27/2014 06:14 PM, Christoph Hellwig wrote: Various patches from the scsi multiqueue development that make sense on their own. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Hi Christoph Many years ago I have sent these exact patches but no-one cared Including me I guess. (one instance is here, I sent it several times) http://comments.gmane.org/gmane.linux.scsi/63347 Please note the 4th patch which is a real BUG, titled: scsi_lib: Can't RETRY scsi_cmnd if some bytes were completed I think my: scsi_lib: Remove that __scsi_release_buffers contraption Is more elegant, is layered better and is smaller code. (please consider my version) Also the first patch is some more cleanup you'd like. The main patch of collapsing scsi_end_request is basically the same. Funny that I've just rebased an exactly 2 years ago version and it rebased cleanly. So here they are for your reference. [PATCH 1/4] scsi_lib: request_queue is only needed inside [PATCH 2/4] scsi_lib: Remove that __scsi_release_buffers contraption [PATCH 3/4] scsi_lib: Collapse scsi_end_request into only user [PATCH 4/4] scsi_lib: BUG: Can't RETRY scsi_cmnd if some bytes were [Your patches have been tested within my MQ testing right?] Thanks for pushing on this Boaz From 9809b82bbb9813370738ac00400c01d61b0c51b4 Mon Sep 17 00:00:00 2001 From: Boaz Harrosh bharr...@panasas.com Date: Mon, 4 Jan 2010 10:45:56 +0200 Subject: [PATCH 1/4] scsi_lib: request_queue is only needed inside scsi_requeue_command This is a pure cleanup. Just starting the engines for things to come. Signed-off-by: Boaz Harrosh bharr...@panasas.com --- drivers/scsi/scsi_lib.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 62ec84b..dd834ca 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -491,10 +491,11 @@ void scsi_requeue_run_queue(struct work_struct *work) * sector. * Notes: Upon return, cmd is a stale pointer. */ -static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd) +static void scsi_requeue_command(struct scsi_cmnd *cmd) { struct scsi_device *sdev = cmd-device; struct request *req = cmd-request; + struct request_queue *q = req-q; unsigned long flags; /* @@ -565,7 +566,6 @@ static void __scsi_release_buffers(struct scsi_cmnd *, int); static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int error, int bytes, int requeue) { - struct request_queue *q = cmd-device-request_queue; struct request *req = cmd-request; /* @@ -584,7 +584,7 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int error, * queue, and goose the queue again. */ scsi_release_buffers(cmd); -scsi_requeue_command(q, cmd); +scsi_requeue_command(cmd); cmd = NULL; } return cmd; @@ -779,7 +779,6 @@ static int __scsi_error_from_host_byte(struct scsi_cmnd *cmd, int result) void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes) { int result = cmd-result; - struct request_queue *q = cmd-device-request_queue; struct request *req = cmd-request; int error = 0; struct scsi_sense_hdr sshdr; @@ -1003,7 +1002,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes) scsi_print_command(cmd); } if (blk_end_request_err(req, error)) - scsi_requeue_command(q, cmd); + scsi_requeue_command(cmd); else scsi_next_command(cmd); break; @@ -1012,7 +1011,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes) * A new command will be prepared and issued. */ scsi_release_buffers(cmd); - scsi_requeue_command(q, cmd); + scsi_requeue_command(cmd); break; case ACTION_RETRY: /* Retry the same command immediately */ -- 1.8.5.3 From aac579b02c7d3665665a1f1f7c064a82b70b873a Mon Sep 17 00:00:00 2001 From: Boaz Harrosh bharr...@panasas.com Date: Mon, 4 Jan 2010 12:46:06 +0200 Subject: [PATCH 2/4] scsi_lib: Remove that __scsi_release_buffers contraption Remove do_bidi_check by checking and nullifying cmd-request when blk_end_* indicates the request is no longer valid. Signed-off-by: Boaz Harrosh bharr...@panasas.com --- drivers/scsi/scsi_lib.c | 41 + 1 file changed, 17 insertions(+), 24 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index dd834ca..c4cdc6a 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -539,8 +539,6 @@ void scsi_run_host_queues(struct Scsi_Host *shost) scsi_run_queue(sdev-request_queue); } -static void __scsi_release_buffers(struct scsi_cmnd *, int); - /* * Function:scsi_end_request() * @@ -591,11 +589,12 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int error
Re: [PATCH v6 2/2] block,scsi: convert and handle ERR_PTR from blk_get_request
On 10/31/2013 03:50 PM, Joe Lawrence wrote: The blk_get_request function may fail in low-memory conditions or during device removal (even if __GFP_WAIT is set). To distinguish between these errors, modify the blk_get_request call stack to return the appropriate ERR_PTR. Verify that all callers check the return status and consider IS_ERR instead of a simple NULL pointer check. Signed-off-by: Joe Lawrence joe.lawre...@stratus.com --- diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c index aa66361ed44b..0250efda647b 100644 --- a/drivers/scsi/osd/osd_initiator.c +++ b/drivers/scsi/osd/osd_initiator.c @@ -1567,8 +1567,8 @@ static struct request *_make_request(struct request_queue *q, bool has_write, struct request *req; req = blk_get_request(q, has_write ? WRITE : READ, flags); - if (unlikely(!req)) - return ERR_PTR(-ENOMEM); + if (unlikely(IS_ERR(req))) Just a nit IS_ERR already has an unlikely so it can be dropped here. (No bigy) ACK-by: Boaz Harrosh bharr...@panasas.com + return req; return req; } Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bypass block layer and Fill SCSI lower layer driver queue
On 09/18/2013 05:07 PM, Douglas Gilbert wrote: On 13-09-18 03:58 AM, Jack Wang wrote: On 09/18/2013 08:41 AM, Alireza Haghdoost wrote: Hi I am working on a high throughput and low latency application which does not tolerate block layer overhead to send IO request directly to fiber channel lower layer SCSI driver. I used to work with libaio but currently I am looking for a way to by pass the block layer and send SCSI commands from the application layer directly to the SCSI driver using /dev/sgX device and ioctl() system call. I have noticed that sending IO request through sg device even with nonblocking and direct IO flags is quite slow and does not fill up lower layer SCSI driver TCQ queue. i.e IO depth or /sys/block/sdX/in_flight is always ZERO. Therefore the application throughput is even lower that sending IO request through block layer with libaio and io_submit() system call. In both cases I used only one IO context (or fd) and single threaded. Hi Alireza, I think what you want is in_flight command scsi dispatch to low level device. I submit a simple patch to export device_busy http://www.spinics.net/lists/linux-scsi/msg68697.html I also notice fio sg engine will not fill queue properly, but haven't look into deeper. Cheers Jack I have noticed that some well known benchmarking tools like fio does not support IO depth for sg devices as well. Therefore, I was wondering if it is feasible to bypass block layer and achieve higher throughput and lower latency (for sending IO request only). Any comment on my issue is highly appreciated. I'm not sure if this is relevant to your problem but by default both the bsg and sg drivers queue at head when they inject SCSI commands into the block layer. The bsg driver has a BSG_FLAG_Q_AT_TAIL flag to change that queueing to what may be preferable for your purposes. The sg driver could, but does not, support that flag. Yes! The current best bet for keeping the queues full are with libaio and direct + asynchronous IO. It should not be significantly slower than bsg. (Believe me, with direct IO Block-Device cache is bypassed and the only difference is in who prepares the struct requests for submission) As Doug said sg can not do it. Also with bsg and above BSG_FLAG_Q_AT_TAIL You will need to use the write() option and not the ioctl() because the later is synchronous and you want an asynchronous submit of commands and background completion of them. (Which is what libaio does with async IO) With bsg you achieve that with using write() in combination of read() to receive the completions. Doug Gilbert Cheers Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] block: handle pointer error from blk_get_request
On 23/05/13 23:09, Joe Lawrence wrote: Hi Jens, Subject: [PATCH v4] block: handle pointer error from blk_get_request The blk_get_request function may fail in low-memory conditions or during device removal (even if __GFP_WAIT is set). To distinguish between these errors, modify the blk_get_request call stack to return the appropriate error pointer. Verify that all callers check the return status and consider IS_ERR instead of a simple NULL pointer check. Signed-off-by: Joe Lawrence joe.lawre...@stratus.com Cc: Jens Axboe ax...@kernel.dk Cc: James E.J. Bottomley jbottom...@parallels.com Cc: Bart Van Assche bvanass...@acm.org Cc: linux-scsi@vger.kernel.org ACK-by: Boaz Harrosh bharr...@panasas.com --- drivers/scsi/osd/osd_initiator.c| 4 ++-- diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c index d8293f2..b4cd56b 100644 --- a/drivers/scsi/osd/osd_initiator.c +++ b/drivers/scsi/osd/osd_initiator.c @@ -1567,8 +1567,8 @@ static struct request *_make_request(struct request_queue *q, bool has_write, struct request *req; req = blk_get_request(q, has_write ? WRITE : READ, flags); - if (unlikely(!req)) - return ERR_PTR(-ENOMEM); + if (unlikely(IS_ERR(req))) + return req; return req; } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
On 02/07/2013 01:27 PM, Hannes Reinecke wrote: On 02/07/2013 11:01 AM, Darrick J. Wong wrote: On Thu, Feb 07, 2013 at 01:40:14AM -0800, Joel Becker wrote: On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote: On Feb 6, 2013, at 3:24 PM, Darrick J. Wong darrick.w...@oracle.com wrote: On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote: Hi, I'm interested in discussing how to pass protection information to and from userspace. Maybe Martin could be enlisted for the discussion. I read that some work has already been done in this area but have not been able to locate it. It looks like the bio-integrity code already makes it possible to generate the t10-dif crc in the filesystem. It would be good to be able to get the guard and application tags back out to backup applications such as xfsdump. Enabling other applications to generate their own tags in userspace is also interesting. This one's been on my list for a couple of years (and companies) too. A few years ago Joel Becker had support for it in his sys_dio proposal (that hasn't gone anywhere), and more recently I've theorized that we could add a magic fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT *{read,write}v call as the PI buffer, which I think is similar to how DIX gets PI data to a disk. But it's not like I have any code to show for it. I /think/ it's fairly straightforward to change the directio submit code to find the userspace PI buffer and amend the block integrity code to attach our own PI buffer. You'd still have to let the block layer set the sector # field, but afaik that won't affect the crc or the app tag. I hear that the NFS guys want to propose some sort of protocol for transmitting PI data (across NFS), but I haven't seen anything concrete yet. I'm writing a requirements document for the NFS protocol which I can discuss at LSF. The use cases for NFS for now would be virtual disk devices (hypervisors) or direct NFS access to storage from user space. Like everyone else we are waiting for a magical VFS and user space API to appear that can pass PI to and from storage. I'm happy to chat about it. Unfortunately, like Darrick says, sys_dio() coding hasn't happened. I do think we're better off with some kind of explicit API than some magic state on the file. I mean, even something like: ssize_t write_with_pi(int fd, const void *buf, size_t count, const void *pi, size_t pi_count); It's not as nice as a non-historical API (eg sys_dio), but it also probably plays nicer with buffered I/O. I also pondered simply adding a new io_prep_* function + IO_CMD_ code to libaio and all the other plumbing necessary to make that happen... void io_prep_preadv_pi(struct iocb *iocb, int fd, const struct iovec *iov, int iovcnt, long long offset, const void *pi, size_t pi_count); This is also what I've envisioned. Updating io_prep / async I/O is reasonably easy as its been using a separate structure for passing in the I/O details. Normal read/write calls don't really map as you simply don't have enough parameter to feed PI information into the kernel. So for that you'd need to invent a new interface / syscall. For aio we just need to add additional fields to an existing structure. So yeah, I'd be interested in that discussion as well. Me too, in multiple fronts. It's part of my general concern about things we would like for user-mode servers I think that the current aio and libaio Interface is broken for a long time, for multitude of reasons. For instance the nested structure definitions are COMPAT broken, and lots of missing pieces. (For example search in archives for why bsg does not support sg-lists.) And there are all these additions that everyone wants on top, that call for a new interface anyway. So I would like to see a deep fixup of this interface, with an aio version2 that can take into considerations, all of future needs including these above. Kernel code will be very happy to be implemented with the new, interface and a COMPAT layer could be put in place for the old interface. All interested parties should bring to the table what is the extension/changes they need. And we can try and union all of them together. (My addition is for support of sg_lists to bsg, in a way that makes Tomo happy I know that qemu was wanting this for a while as well as the multitude of user-mode servers) Thanks Boaz Cheers, Hannes -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
On 02/07/2013 02:08 PM, Boaz Harrosh wrote: On 02/07/2013 01:27 PM, Hannes Reinecke wrote: On 02/07/2013 11:01 AM, Darrick J. Wong wrote: On Thu, Feb 07, 2013 at 01:40:14AM -0800, Joel Becker wrote: On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote: On Feb 6, 2013, at 3:24 PM, Darrick J. Wong darrick.w...@oracle.com wrote: On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote: Hi, I'm interested in discussing how to pass protection information to and from userspace. Maybe Martin could be enlisted for the discussion. I read that some work has already been done in this area but have not been able to locate it. It looks like the bio-integrity code already makes it possible to generate the t10-dif crc in the filesystem. It would be good to be able to get the guard and application tags back out to backup applications such as xfsdump. Enabling other applications to generate their own tags in userspace is also interesting. This one's been on my list for a couple of years (and companies) too. A few years ago Joel Becker had support for it in his sys_dio proposal (that hasn't gone anywhere), and more recently I've theorized that we could add a magic fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT *{read,write}v call as the PI buffer, which I think is similar to how DIX gets PI data to a disk. But it's not like I have any code to show for it. I /think/ it's fairly straightforward to change the directio submit code to find the userspace PI buffer and amend the block integrity code to attach our own PI buffer. You'd still have to let the block layer set the sector # field, but afaik that won't affect the crc or the app tag. I hear that the NFS guys want to propose some sort of protocol for transmitting PI data (across NFS), but I haven't seen anything concrete yet. I'm writing a requirements document for the NFS protocol which I can discuss at LSF. The use cases for NFS for now would be virtual disk devices (hypervisors) or direct NFS access to storage from user space. Like everyone else we are waiting for a magical VFS and user space API to appear that can pass PI to and from storage. I'm happy to chat about it. Unfortunately, like Darrick says, sys_dio() coding hasn't happened. I do think we're better off with some kind of explicit API than some magic state on the file. I mean, even something like: ssize_t write_with_pi(int fd, const void *buf, size_t count, const void *pi, size_t pi_count); It's not as nice as a non-historical API (eg sys_dio), but it also probably plays nicer with buffered I/O. I also pondered simply adding a new io_prep_* function + IO_CMD_ code to libaio and all the other plumbing necessary to make that happen... void io_prep_preadv_pi(struct iocb *iocb, int fd, const struct iovec *iov, int iovcnt, long long offset, const void *pi, size_t pi_count); This is also what I've envisioned. Updating io_prep / async I/O is reasonably easy as its been using a separate structure for passing in the I/O details. Normal read/write calls don't really map as you simply don't have enough parameter to feed PI information into the kernel. So for that you'd need to invent a new interface / syscall. For aio we just need to add additional fields to an existing structure. So yeah, I'd be interested in that discussion as well. Me too, in multiple fronts. It's part of my general concern about things we would like for user-mode servers I think that the current aio and libaio Interface is broken for a long time, for multitude of reasons. For instance the nested structure definitions are COMPAT broken, and lots of missing pieces. (For example search in archives for why bsg does not support sg-lists.) And there are all these additions that everyone wants on top, that call for a new interface anyway. So I would like to see a deep fixup of this interface, with an aio version2 that can take into considerations, all of future needs including these above. Kernel code will be very happy to be implemented with the new, interface and a COMPAT layer could be put in place for the old interface. All interested parties should bring to the table what is the extension/changes they need. And we can try and union all of them together. (My addition is for support of sg_lists to bsg, in a way that makes Tomo happy I know that qemu was wanting this for a while as well as the multitude of user-mode servers) I wanted to add that there is another LSF/MM thread going on about: [LSF TOPIC] What to do about O_DIRECT? All these guys should be participating here, so to change core structures and behavior to a better model, that helps us here, and not against us. (Again libaio should be changed in concert with Kernel's new API, and we can sacrifice old user-mode performance
Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
On 02/07/2013 02:29 PM, Bart Van Assche wrote: On 02/07/13 13:08, Boaz Harrosh wrote: (My addition is for support of sg_lists to bsg, in a way that makes Tomo happy I know that qemu was wanting this for a while as well as the multitude of user-mode servers) Do you think it would help / make sense if sg_alloc_table() would be modified such that it allocates the entire scatterlist table via one vmalloc() call instead of chaining several page-sized scatterlist tables ? Note: such a change is not possible without modifying scsi_alloc_sgtable(). I don't think so, no. sg_alloc_table() is used not only for direct IO also for buffered, Now vmalloc() is terribly slow and would be a bottleneck in today's SSD performance. I love it that the Linux Kernel never uses vmalloc internally, and only ever chains everything to upto PAGE_SIZE sized objects. Coming from all these other OSs that don't, believe me, it is great great performance pain. (TLBs are a bitch) Bart. Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
On 02/07/2013 02:33 PM, Hannes Reinecke wrote: On 02/07/2013 01:16 PM, Boaz Harrosh wrote: (Again libaio should be changed in concert with Kernel's new API, and we can sacrifice old user-mode performance, with a COMPAT layer. Distro maintainers should consider replacing libaio, together with the new Kernel, so it is only those that do their own mix-and-match, who can fix that mismatch too) And while we're at it, I still would _love_ to connect aio_cancel() and blk_abort_request(). That way we could sensibly abort an I/O and get out of the darn 'D' state. Yes!! Thanks. It is very interesting how the socket side of the world had it correct for ages, and the same fd object on disks is second grade citizen in UNIX land. (Anybody voting for epoll on async disk IO? ) Thanks Hannes yes that too. And wait_interuptable() too, at couple of places, will need some serious error handling audit for that. Cheers, Hannes Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LSF/MM TOPIC] Thin provisioning SOFT_THRESHOLD error handling
On 01/29/2013 10:14 AM, Hannes Reinecke wrote: Hi all, Thin-provisioned devices have the ability to set a 'soft threshold', which is triggered if the real free space for this device is beyond this mark. The intention behind this is to allow the system to induce some garbage collection with possibly freeing up unused space. Initially it would be possible to execute garbage collection on filesystems (eg for btrfs). However, as this concept applies to other areas within the kernel (like dm-thinp or even btrfs itself) it might be an idea to have a general mechanism / error handling etc in place. I would like to discuss at LSF the possible implementations and handling mechanism for this kind of failure scenarios. Hannes hi! This is received as an unit attentions, right? Will it not be worth while to solve the general unit attentions under udev events, once and for all. Than such a btrfs GC above can just be a simple oneline udev rule. (I think that the event-storm problem you had at the time can be solved with some Kernel side unit attentions queue, and greatly reduce the chance for missed events) Thanks Boaz Cheers, Hannes -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] [SCSI] libosd: check for kzalloc() failure
On 01/30/2013 04:34 PM, walter harms wrote: I start to see the complexity of the situation. Would you mind to add the comment it can be anything. UTF-8 is more likely but not guaranteed either ? For now using a pascal-string seems the best solution but it should be warned that get_attrs[a].val_ptr is NOT a c-string and therefore can not be used with the printf-family (i guess the situation will become more clear in future) OK, So... The long story The STD says that osdname can be *anything* binary or otherwise, and of *any* length. And is used to uniquely identify the OSD-device/partition in a cluster. We decided that if so we would stick an 40 char UUID in it, generated by a uuidgen and all the external utilities and surrounding code forces it, and treat it as a c-string. But this code here in the core cannot make that assumption and still support the STD. On the other hand we did want the osdname to be printable in traces and messages because it is such an important identifier. So I have decided to sacrifice an extra char in-memory to carry a \NUL and safely stick it into printk(s). Those (none existent) OSD devices that will put unprintable characters in here will still function fine, but will look real scary in printk(s). Please note that the one that sets policy is the osd-target vendor. (And they all currently use my code base) So save the kzalloc check this code (tested) is safe and will show strings when present, but will gracefully show ugly things but still work when not. I have no clue why you need that, using c-strings makes life more easy in the last minute a sprintf(buf,%u) will save the day ;) Actually it is very funny, because just recently (last week) I have discovered something that eliminates all those funny business. printf(%*s, odi-osdname, odi-osdname_len); The * will instruct c to expect an extra variable following %s which is the max_length of %s. This is exactly for pascal strings and the such like above. So I added a TODO to clean that a bit by always printing with %*s It look clever to add the NULL test here. Noone reviewing the code will understand that. (Rule of least surprise) Thanks for looking, I agree it needs a fat comment, I'll do that when I'll convert to above system. Thanks for looking, That code is really old and never had any extra eyes on it. just my 2 cents, re, wh Cheers Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] [SCSI] libosd: check for kzalloc() failure
On 01/30/2013 09:06 AM, Dan Carpenter wrote: There wasn't any error handling for this kzalloc(). ACK-by: Boaz Harrosh bharr...@panasas.com James please queue for inclusion Signed-off-by: Dan Carpenter dan.carpen...@oracle.com Thanks Dan diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c index c06b8e5..d8293f2 100644 --- a/drivers/scsi/osd/osd_initiator.c +++ b/drivers/scsi/osd/osd_initiator.c @@ -144,6 +144,10 @@ static int _osd_get_print_system_info(struct osd_dev *od, odi-osdname_len = get_attrs[a].len; /* Avoid NULL for memcmp optimization 0-length is good enough */ odi-osdname = kzalloc(odi-osdname_len + 1, GFP_KERNEL); + if (!odi-osdname) { + ret = -ENOMEM; + goto out; + } if (odi-osdname_len) memcpy(odi-osdname, get_attrs[a].val_ptr, odi-osdname_len); OSD_INFO(OSD_NAME [%s]\n, odi-osdname); -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] exofs: 3 changes to exofs osd
On 12/18/2012 12:58 PM, James Bottomley wrote: On Mon, 2012-12-17 at 18:53 +0200, Boaz Harrosh wrote: Hi Linus. Please pull the following changes since commit [ddffeb8c] Linux 3.7-rc1 They are available in the git repository at: git://git.open-osd.org/linux-open-osd.git for-linus for you to fetch changes up to [861d6660] exofs: don't leak io_state and pages on read error (2012-12-14 12:17:32 +0200) These are just 3 patches, the last two are bug fixes on the error paths in exofs. The important patch is the one to osd_uld which adds sysfs info to osd devices for use by user-mode clustering discovery software. I'm already sitting on this patch since before February this year, It is important for some of the big installation cluster systems, who's been compiling their own kernel just for that patch. I'm a bit perplexed by this. You got notice when it was added to the SCSI tree and now it's already upstream: commit 51976a8c85cec0c62e410bc38b8a11dbc690764d Author: Boaz Harrosh bharr...@panasas.com Date: Wed Oct 24 14:51:41 2012 -0700 [SCSI] osd_uld: Add osdname systemid sysfs at scsi_osd class But the authorship info differs ... it looks like you forgot to include the From: tag in your original patch send. I'm so sorry, I completely goofed on this one. It's what happens when you are swamped with other work and are doing things without thinking. I totally forgot that I need to remove this patch. Both these patches where in linux-next for a long time. So I believe the merge will go just fine. Lets leave it like this, or I can rebase and remove it? James Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] exofs: 3 changes to exofs osd
On 12/18/2012 01:28 PM, James Bottomley wrote: Both these patches where in linux-next for a long time. So I believe the merge will go just fine. Lets leave it like this, or I can rebase and remove it? If it merges OK, I'd just leave it as is. It wouldn't be anywhere close to the first time we've had the same patch via different trees. Thanks James, it will not happen again. I just had a look on linux-next of Dec 17 they both where merged in and I did not see any complains. James Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] exofs: 3 changes to exofs osd
Hi Linus. Please pull the following changes since commit [ddffeb8c] Linux 3.7-rc1 They are available in the git repository at: git://git.open-osd.org/linux-open-osd.git for-linus for you to fetch changes up to [861d6660] exofs: don't leak io_state and pages on read error (2012-12-14 12:17:32 +0200) These are just 3 patches, the last two are bug fixes on the error paths in exofs. The important patch is the one to osd_uld which adds sysfs info to osd devices for use by user-mode clustering discovery software. I'm already sitting on this patch since before February this year, It is important for some of the big installation cluster systems, who's been compiling their own kernel just for that patch. Thanks in advance Boaz Boaz Harrosh (1): exofs: don't leak io_state and pages on read error Idan Kedar (1): exofs: clean up the correct page collection on write error Sachin Bhamare (1): osduld: Add osdname systemid sysfs at scsi_osd class drivers/scsi/osd/osd_uld.c | 28 fs/exofs/inode.c | 16 +--- 2 files changed, 37 insertions(+), 7 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [osd-dev] [PATCH] osduld: Add osdname systemid sysfs at scsi_osd class
On 10/24/2012 02:55 PM, Boaz Harrosh wrote: On 10/24/2012 02:51 PM, Boaz Harrosh wrote: This patch adds the support for the following two read-only sysfs attributes to scsi_osd class members : osdname systemid These attributes will show up as below in sysfs class hierarchy: /sys/class/scsi_osd/osdX/osdname /sys/class/scsi_osd/osdX/systemid The osdname systemid are OSD device attributes which uniquely identify a device on the network, while it's IP and certainly it's /dev/osdX device path might change. Userspace utilities (e.g. mkfs.exofs) can parse these attributes to identify the correct OSD in safer and faster way. (Today osd apps open each device in the system and send a attributes query for these, in order to access the user requested device) Signed-off-by: Sachin Bhamare sbham...@panasas.com Signed-off-by: Boaz Harrosh bharr...@panasas.com James hi. This is for the next (v3.8) merge window. Please submit to scsi-misc tree. James ping! Do you want that I push it through my osd tree? Thanks Boaz It is actually a very old well tested, but forgotten patch. Thanks Boaz --- drivers/scsi/osd/osd_uld.c | 28 1 files changed, 28 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c index d4ed9eb..4375417 100644 --- a/drivers/scsi/osd/osd_uld.c +++ b/drivers/scsi/osd/osd_uld.c @@ -97,9 +97,37 @@ struct osd_dev_handle { static DEFINE_IDA(osd_minor_ida); +/* + * scsi sysfs attribute operations + */ +static ssize_t osdname_show(struct device *dev, struct device_attribute *attr, +char *buf) +{ +struct osd_uld_device *ould = container_of(dev, struct osd_uld_device, + class_dev); +return sprintf(buf, %s\n, ould-odi.osdname); +} + +static ssize_t systemid_show(struct device *dev, struct device_attribute *attr, +char *buf) +{ +struct osd_uld_device *ould = container_of(dev, struct osd_uld_device, + class_dev); + +memcpy(buf, ould-odi.systemid, ould-odi.systemid_len); +return ould-odi.systemid_len; +} + +static struct device_attribute osd_uld_attrs[] = { +__ATTR(osdname, S_IRUGO, osdname_show, NULL), +__ATTR(systemid, S_IRUGO, systemid_show, NULL), +__ATTR_NULL, +}; + static struct class osd_uld_class = { .owner = THIS_MODULE, .name = scsi_osd, +.dev_attrs = osd_uld_attrs, }; /* ___ osd-dev mailing list osd-...@open-osd.org http://mailman.open-osd.org/mailman/listinfo/osd-dev -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] osduld: Add osdname systemid sysfs at scsi_osd class
This patch adds the support for the following two read-only sysfs attributes to scsi_osd class members : osdname systemid These attributes will show up as below in sysfs class hierarchy: /sys/class/scsi_osd/osdX/osdname /sys/class/scsi_osd/osdX/systemid The osdname systemid are OSD device attributes which uniquely identify a device on the network, while it's IP and certainly it's /dev/osdX device path might change. Userspace utilities (e.g. mkfs.exofs) can parse these attributes to identify the correct OSD in safer and faster way. (Today osd apps open each device in the system and send a attributes query for these, in order to access the user requested device) Signed-off-by: Sachin Bhamare sbham...@panasas.com Signed-off-by: Boaz Harrosh bharr...@panasas.com --- drivers/scsi/osd/osd_uld.c | 28 1 files changed, 28 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c index d4ed9eb..4375417 100644 --- a/drivers/scsi/osd/osd_uld.c +++ b/drivers/scsi/osd/osd_uld.c @@ -97,9 +97,37 @@ struct osd_dev_handle { static DEFINE_IDA(osd_minor_ida); +/* + * scsi sysfs attribute operations + */ +static ssize_t osdname_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct osd_uld_device *ould = container_of(dev, struct osd_uld_device, + class_dev); + return sprintf(buf, %s\n, ould-odi.osdname); +} + +static ssize_t systemid_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct osd_uld_device *ould = container_of(dev, struct osd_uld_device, + class_dev); + + memcpy(buf, ould-odi.systemid, ould-odi.systemid_len); + return ould-odi.systemid_len; +} + +static struct device_attribute osd_uld_attrs[] = { + __ATTR(osdname, S_IRUGO, osdname_show, NULL), + __ATTR(systemid, S_IRUGO, systemid_show, NULL), + __ATTR_NULL, +}; + static struct class osd_uld_class = { .owner = THIS_MODULE, .name = scsi_osd, + .dev_attrs = osd_uld_attrs, }; /* -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [osd-dev] [PATCH] osduld: Add osdname systemid sysfs at scsi_osd class
On 10/24/2012 02:51 PM, Boaz Harrosh wrote: This patch adds the support for the following two read-only sysfs attributes to scsi_osd class members : osdname systemid These attributes will show up as below in sysfs class hierarchy: /sys/class/scsi_osd/osdX/osdname /sys/class/scsi_osd/osdX/systemid The osdname systemid are OSD device attributes which uniquely identify a device on the network, while it's IP and certainly it's /dev/osdX device path might change. Userspace utilities (e.g. mkfs.exofs) can parse these attributes to identify the correct OSD in safer and faster way. (Today osd apps open each device in the system and send a attributes query for these, in order to access the user requested device) Signed-off-by: Sachin Bhamare sbham...@panasas.com Signed-off-by: Boaz Harrosh bharr...@panasas.com James hi. This is for the next (v3.8) merge window. Please submit to scsi-misc tree. It is actually a very old well tested, but forgotten patch. Thanks Boaz --- drivers/scsi/osd/osd_uld.c | 28 1 files changed, 28 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c index d4ed9eb..4375417 100644 --- a/drivers/scsi/osd/osd_uld.c +++ b/drivers/scsi/osd/osd_uld.c @@ -97,9 +97,37 @@ struct osd_dev_handle { static DEFINE_IDA(osd_minor_ida); +/* + * scsi sysfs attribute operations + */ +static ssize_t osdname_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct osd_uld_device *ould = container_of(dev, struct osd_uld_device, +class_dev); + return sprintf(buf, %s\n, ould-odi.osdname); +} + +static ssize_t systemid_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct osd_uld_device *ould = container_of(dev, struct osd_uld_device, +class_dev); + + memcpy(buf, ould-odi.systemid, ould-odi.systemid_len); + return ould-odi.systemid_len; +} + +static struct device_attribute osd_uld_attrs[] = { + __ATTR(osdname, S_IRUGO, osdname_show, NULL), + __ATTR(systemid, S_IRUGO, systemid_show, NULL), + __ATTR_NULL, +}; + static struct class osd_uld_class = { .owner = THIS_MODULE, .name = scsi_osd, + .dev_attrs = osd_uld_attrs, }; /* -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target: Remove unused se_cmd.cmd_spdtl
We should switch this topic to the scsi mailing list On 08/17/2012 01:49 PM, Boaz Harrosh wrote: On 08/17/2012 01:12 AM, Nicholas A. Bellinger wrote: On Thu, 2012-08-16 at 09:24 -0700, Roland Dreier wrote: Actually I'm not sure removing cmd_spdtl is the right answer. If /dev/sda is a device on an iSCSI initiator exported by an LIO target, try doing: # sg_raw -r512 /dev/sda 28 0 0 0 0 0 0 0 2 0 This issues a READ (10) for 2 sectors, but only sends a length of 512 at the transfer level. The target responds by setting the residual to 512 but transmits all 1024 bytes, Is this the correct behavior from the Target? I would imagine that the target needs to only send 512 bytes (transfer level size) and set the OVERFLOW bit and residual to 512. Not that it really matter because as you stated below the Initiator makes sure nothing bad happens. BTW what target are we talking about, on the other side of the initiator here? (There are two targets in this setup right?) and the Linux initiator at least rejects it because it hits: if (tcp_task-data_offset + tcp_conn-in.datalen total_in_length) { ISCSI_DBG_TCP(conn, data_offset(%d) + data_len(%d) total_length_in(%d)\n, tcp_task-data_offset, tcp_conn-in.datalen, total_in_length); return ISCSI_ERR_DATA_OFFSET; } Ok, this is the 'overflow' case when the fabric -data_length (expected data transfer length of the initiator's buffer) is smaller than the SCSI CDB allocation length or sectors*block_size (attempted transfer length) the target has been asked to process. The following patch which appears to do the right thing from the perspective of the target for overflow, but AFAICT open-iscsi still returns GOOD status w/ no data-in payload (at least via sg_raw) when the iscsi-target is signaling overflow bit in iSCSI Data IN PDU. Not sure yet why this is the case, but drivers/scsi/libiscsi.c:iscsi_data_in_rsp code: if (rhdr-flags (ISCSI_FLAG_DATA_UNDERFLOW | ISCSI_FLAG_DATA_OVERFLOW)) { int res_count = be32_to_cpu(rhdr-residual_count); if (res_count 0 (rhdr-flags ISCSI_FLAG_CMD_OVERFLOW || res_count = scsi_in(sc)-length)) scsi_in(sc)-resid = res_count; else sc-result = (DID_BAD_TARGET 16) | rhdr-cmd_status; } OK I admit I kind of rearranged all this code a few years ago. Guilty ;-) Well now that I look at it again, I think it is totally wrong!! The scsi and block layer do not know anything about CMD_OVERFLOW If a scsi_in/out(sc)-resid is set it only ever means UNDERFLOW. Both scsi and block expects to only do transfer_length - resid. This is why you get empty buffer back because the transfer_length=512 minus resid=512 means zero bytes. So the || ISCSI_FLAG_CMD_OVERFLOW case is wrong. Now the big question is what to do. Fail the all command, or say nothing? The correct thing is to teach the middle layers about overflow, With some kind of warning level system. I was also thinking that we can make resid signed and signal an overflow with a negative resid. Now the math of transfer_length - resid will become correct since it means more, in the case above 512 - (-512) would be our 1024 CDB len. For now this code must be fixed. And the command must fail. Do you need that I prepare a patch? (Please you do it, I'm swamped, it'll take me time) There are 3 more places like this. BTW did you notice how in the code above we have mixed up the use of: ISCSI_FLAG_DATA_OVERFLOW and ISCSI_FLAG_CMD_OVERFLOW? That's another bug, here it should be ISCSI_FLAG_DATA_OVERFLOW only. The other places with the other flags. appears to be setting resid for non bidi cases correctly, right..? (mnc + boaz CC'ed) How about the following to ensure we restrict overflow to keep the existing (smaller) cmd-data_length assignment, and only re-assign this value for the underflow case..? (hch CC'ed) WDYT..? diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 0eaae23..63b3594 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -1183,15 +1183,20 @@ int target_cmd_size_check(struct se_cmd *cmd, unsigned int size) /* Returns CHECK_CONDITION + INVALID_CDB_FIELD */ goto out_invalid_cdb_field; } - + /* +* For the overflow case keep the existing fabric provided +* -data_length. Otherwise for the underflow case, reset +* -data_length to the smaller SCSI expected data transfer +* length. +*/ if (size cmd-data_length) { I'm a bit out of context
Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)
On 07/30/2012 10:12 AM, Paolo Bonzini wrote: Il 30/07/2012 01:50, Rusty Russell ha scritto: Also, being the first user of chained scatterlist doesn't exactly give me warm fuzzies. We're far from the first user: they've been in the kernel for well over 7 years. They were introduced for the block layer, but they tended to ignore private uses of scatterlists like this one. Yeah, but sg_chain has no users in drivers, only a private one in lib/scatterlist.c. The internal API could be changed to something else and leave virtio-scsi screwed... Yes, we should do this. But note that this means an iteration, so we might as well combine the loops :) I'm really bad at posting pseudo-code, but you can count the number of physically-contiguous entries at the beginning of the list only. So if everything is contiguous, you use a single non-indirect buffer and save a kmalloc. If you use indirect buffers, I suspect it's much less effective to collapse physically-contiguous entries. More elaborate heuristics do need a loop, though. [All the below with a grain of salt, from my senile memory] You must not forget some facts about the scatterlist received here at the LLD. It has already been DMA mapped and locked by the generic layer. Which means that the DMA engine has already collapsed physically-contiguous entries. Those you get here are already unique physically. (There were bugs in the past, where this was not true, please complain if you find them again) A scatterlist is two different lists taking the same space, but with two different length. - One list is the PAGE pointers plus offset length, which is bigger or equal to the 2nd list. The end marker corresponds to this list. This list is the input into the DMA engine. - Second list is the physical DMA addresses list. With their physical-lengths. Offset is not needed because it is incorporated in the DMA address. This list is the output from the DMA engine. The reason 2nd list is shorter is because the DMA engine tries to minimize the physical scatter-list entries which is usually a limited HW resource. This list might follow chains but it's end is determined by the received sg_count from the DMA engine, not by the end marker. At the time my opinion, and I think Rusty agreed, was that the scatterlist should be split in two. The input page-ptr list is just the BIO, and the output of the DMA-engine should just be the physical part of the sg_list, as a separate parameter. But all this was berried under too much APIs and the noise was two strong, for any single brave sole. So I'd just trust blindly the returned sg_count from the DMA engine, it is already optimized. I THINK Paolo Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list
On 07/25/2012 11:44 AM, Paolo Bonzini wrote: Il 25/07/2012 10:29, Wang Sen ha scritto: When using the commands below to write some data to a virtio-scsi LUN of the QEMU guest(32-bit) with 1G physical memory(qemu -m 1024), the qemu will crash. # sudo mkfs.ext4 /dev/sdb (/dev/sdb is the virtio-scsi LUN.) # sudo mount /dev/sdb /mnt # dd if=/dev/zero of=/mnt/file bs=1M count=1024 In current implementation, sg_set_buf is called to add buffers to sg list which is put into the virtqueue eventually. But there are some HighMem pages in table-sgl can not get virtual address by sg_virt. So, sg_virt(sg_elem) may return NULL value. This will cause QEMU exit when virtqueue_map_sg is called in QEMU because an invalid GPA is passed by virtqueue. Heh, I was compiling (almost) the same patch as we speak. :) I've never seen QEMU crash; the VM would more likely just fail to boot with a panic. But it's the same bug anyway. My solution is using sg_set_page instead of sg_set_buf. I have tested the patch on my workstation. QEMU would not crash any more. Signed-off-by: Wang Sen senw...@linux.vnet.ibm.com --- drivers/scsi/virtio_scsi.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 1b38431..fc5c88a 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -198,7 +198,8 @@ static void virtscsi_map_sgl(struct scatterlist *sg, unsigned int *p_idx, int i; for_each_sg(table-sgl, sg_elem, table-nents, i) -sg_set_buf(sg[idx++], sg_virt(sg_elem), sg_elem-length); +sg_set_page(sg[idx++], sg_page(sg_elem), sg_elem-length, +sg_elem-offset); This can simply be sg[idx++] = *sg_elem; Can you repost it with this change, and also add sta...@vger.kernel.org to the Cc? Thanks very much! No! Please use sg_set_page()! Look at sg_set_page(), which calls sg_assign_page(). It has all these jump over chained arrays. When you'll start using long sg_lists (which you should) then jumping from chain to chain must go through sg_page(sg_elem) sg_assign_page(), As in the original patch. Thanks Boaz Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list
On 07/25/2012 12:41 PM, Paolo Bonzini wrote: Il 25/07/2012 11:22, Boaz Harrosh ha scritto: for_each_sg(table-sgl, sg_elem, table-nents, i) -sg_set_buf(sg[idx++], sg_virt(sg_elem), sg_elem-length); +sg_set_page(sg[idx++], sg_page(sg_elem), sg_elem-length, +sg_elem-offset); This can simply be sg[idx++] = *sg_elem; Can you repost it with this change, and also add sta...@vger.kernel.org to the Cc? Thanks very much! No! Please use sg_set_page()! Look at sg_set_page(), which calls sg_assign_page(). It has all these jump over chained arrays. When you'll start using long sg_lists (which you should) then jumping from chain to chain must go through sg_page(sg_elem) sg_assign_page(), As in the original patch. Hi Boaz, actually it seems to me that using sg_set_page is wrong, because it will not copy the end marker from table-sgl to sg[]. If something chained the sg[] scatterlist onto something else, sg_next's test for sg_is_last would go beyond the table-nents-th item and access invalid memory. Yes, you did not understand this structure. And Yes I am right, when using chained lists you *must* use sg_set_page(). You see the chaining belongs to the allocation not the value of the sg-elements. One must not copy the chaining marker to the destination array which might have it's own. And one must not crap all over the destination chaining markers, set at allocation time. The sizes and mostly the pointers of source and destination are not the same. Using chained sglists is on my to-do list, I expect that it would make a nice performance improvement. However, I was a bit confused as to what's the plan there; there is hardly any user, and many arches still do not define ARCH_HAS_SG_CHAIN. Do you have any pointer to discussions or LWN articles? Only the source code I'm afraid. In SCSI land most LLDs should support chaining just by virtu of using the for_each_sg macro. That all it takes. Your code above does support it. (In Wang version). Though more code need probably be added at sg allocation to actually allocate and prepare a chain. I would need to add support for the long sglists to virtio; this is not a problem, but in the past Rusty complained that long sg-lists are not well suited to virtio (which would like to add elements not just at the beginning of a given sglist, but also at the end). Well that can be done as well, (If done carefully) It should be easy to add chained fragments to both the end of a given chain just as at beginning. It only means that the last element of the appended-to chain moves to the next fragment and it's place is replaced by a link. If you have ready made two long segments A and C which you would like not to reallocate and copy, you insert a two-elements segment in the middle, say call it B. The first element of B is the last element of A which is now used as the pointer to B, and the 2nd element of B is a pointer to C. It seems to me that virtio would prefer to work with a struct scatterlist ** rather than a long sglist. That's just going backwards, and lazy. As you said if you want to enjoy the better performance cake you better break some eggs ;-) Paolo Cheers Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list
On 07/25/2012 02:44 PM, Sen Wang wrote: 2012/7/25 Paolo Bonzini pbonz...@redhat.com: Il 25/07/2012 10:29, Wang Sen ha scritto: When using the commands below to write some data to a virtio-scsi LUN of the QEMU guest(32-bit) with 1G physical memory(qemu -m 1024), the qemu will crash. # sudo mkfs.ext4 /dev/sdb (/dev/sdb is the virtio-scsi LUN.) # sudo mount /dev/sdb /mnt # dd if=/dev/zero of=/mnt/file bs=1M count=1024 In current implementation, sg_set_buf is called to add buffers to sg list which is put into the virtqueue eventually. But there are some HighMem pages in table-sgl can not get virtual address by sg_virt. So, sg_virt(sg_elem) may return NULL value. This will cause QEMU exit when virtqueue_map_sg is called in QEMU because an invalid GPA is passed by virtqueue. Heh, I was compiling (almost) the same patch as we speak. :) Uh, what a coincidence! :) I've never seen QEMU crash; the VM would more likely just fail to boot with a panic. But it's the same bug anyway. I never met this before. How this situation happens? My solution is using sg_set_page instead of sg_set_buf. I have tested the patch on my workstation. QEMU would not crash any more. Signed-off-by: Wang Sen senw...@linux.vnet.ibm.com --- drivers/scsi/virtio_scsi.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 1b38431..fc5c88a 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -198,7 +198,8 @@ static void virtscsi_map_sgl(struct scatterlist *sg, unsigned int *p_idx, int i; for_each_sg(table-sgl, sg_elem, table-nents, i) - sg_set_buf(sg[idx++], sg_virt(sg_elem), sg_elem-length); + sg_set_page(sg[idx++], sg_page(sg_elem), sg_elem-length, + sg_elem-offset); This can simply be sg[idx++] = *sg_elem; Yes, I saw your another E-mail. I think you're right. Simply calling sg_set_page can not handle the flag bits correctly. So, I'll repost the patch soon. Thank you! No this code is correct, though you will need to make sure to properly terminate the destination sg_list. But since old code was using sg_set_buf(), than it means it was properly terminated before, and there for this code is good as is please don't touch it. Thanks Boaz Can you repost it with this change, and also add sta...@vger.kernel.org to the Cc? Thanks very much! Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list
On 07/25/2012 03:34 PM, Paolo Bonzini wrote: Il 25/07/2012 14:13, Wang Sen ha scritto: When using the commands below to write some data to a virtio-scsi LUN of the QEMU guest(32-bit) with 1G physical memory(qemu -m 1024), the qemu will crash. # sudo mkfs.ext4 /dev/sdb (/dev/sdb is the virtio-scsi LUN.) # sudo mount /dev/sdb /mnt # dd if=/dev/zero of=/mnt/file bs=1M count=1024 In current implementation, sg_set_buf is called to add buffers to sg list which is put into the virtqueue eventually. But if there are some HighMem pages in table-sgl you can not get virtual address by sg_virt. So, sg_virt(sg_elem) may return NULL value. This will cause QEMU exit when virtqueue_map_sg is called in QEMU because an invalid GPA is passed by virtqueue. I take Paolo's solution mentioned in last thread to avoid failure on handling flag bits. Please include an URL or (better) summarize the reason why sg_set_page is not correct in the commit message. For example, replace this paragraph with the following: To fix this, we can simply copy the original scatterlist entries into virtio-scsi's; we need to copy the entries entirely, including the flag bits, so using sg_set_page is not correct. Please send v3 with this change and I'll add my Acked-by. NACK-by: Boaz Harrosh Apart from the HighMem pages problem, where in previous sg_set_buf() code was the marker copied? It was not because it is not needed because the allocation of sg took care of that. For example in 64bit the is no bugs, right? If there was a destination sg_list termination bug, it should be fixed as a separate patch from this HighMem pages problem. But I bet if you inspect the code carefully there isn't such a bug. Cheers Boaz Paolo I have tested the patch on my workstation. QEMU would not crash any more. Signed-off-by: Wang Sen senw...@linux.vnet.ibm.com --- drivers/scsi/virtio_scsi.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 1b38431..6661610 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -198,7 +198,7 @@ static void virtscsi_map_sgl(struct scatterlist *sg, unsigned int *p_idx, int i; for_each_sg(table-sgl, sg_elem, table-nents, i) -sg_set_buf(sg[idx++], sg_virt(sg_elem), sg_elem-length); +sg[idx++] = *sg_elem; *p_idx = idx; } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list
On 07/25/2012 03:49 PM, Paolo Bonzini wrote: Except here the destination array has to be given to virtio, which doesn't (yet) understand chaining. I'm using for_each_sg rather than a simple memcpy exactly because I want to flatten the input scatterlist onto consecutive scatterlist entries, which is what virtio expects (and what I'll change when I get to it). for_each_sg guarantees that I get non-chain scatterlists only, so it is okay to value-assign them to sg[]. So if the virtio does not understand chaining at all then surly it will not understand the 2-bit end marker and will get a wrong page pointer with the 1st bit set. As I said!! Since the last code did sg_set_buff() and worked then you must change it with sg_set_page(). If there are *any* chaining-or-no-chaining markers errors these should be fixed as a separate patch! Please lets concentrate at the problem at hand. snip It was _not_ properly terminated, and didn't matter because virtio doesn't care about termination. Changing all the virtio devices to properly terminate chains (and to use for_each_sg) is a prerequisite for properly supporting long sglists). Fine then your code is now a crash because the terminating bit was just copied over, which it was not before. Now Back to the how to support chaining: Lets separate the two topics from now on. Send me one mail concerning the proper above patch, And a different mail for how to support chaining. In SCSI land most LLDs should support chaining just by virtu of using the for_each_sg macro. That all it takes. Your code above does support it. Yes, it supports it but still has to undo them before passing to virtio. What my LLD does is add a request descriptor in front of the scatterlist that the LLD receives. I would like to do this with a 2-item scatterlist: one for the request descriptor, and one which is a chain to the original scatterlist. I hate that plan. Why yet override the scatter element yet again with a third union of a request descriptor The reason it was overloaded as a link-pointer in the first place was because of historical compatibility reasons and not because of a good design. You should have a proper request descriptor structure defined, pointing to (or followed by), an sglist-chain. And all of the above is mute. Except that if I call sg_chain and my architecture does not define ARCH_HAS_SG_CHAIN, it will BUG out. So I need to keep all the scatterlist allocation and copying crap that I have now within #ifdef, and it will bitrot. except that with the correct design you don't call sg_chain you just do: request_descriptor-sg_list = sg; I would need to add support for the long sglists to virtio; this is not a problem, but in the past Rusty complained that long sg-lists are not well suited to virtio (which would like to add elements not just at the beginning of a given sglist, but also at the end). Well that can be done as well, (If done carefully) It should be easy to add chained fragments to both the end of a given chain just as at beginning. It only means that the last element of the appended-to chain moves to the next fragment and it's place is replaced by a link. But you cannot do that in constant time, can you? And that means you do not enjoy any benefit in terms of cache misses etc. I did not understand constant time it is O(0) if that what you meant. (And surly today's code of copy the full list cache misses) Also, this assumes that I can modify the appended-to chain. I'm not sure I can do this? Each case it's own. If the appended-to chain is const, yes you'll have to reallocate it and copy. Is that your case? Cheers Boaz It seems to me that virtio would prefer to work with a struct scatterlist ** rather than a long sglist. That's just going backwards, and lazy. As you said if you want to enjoy the better performance cake you better break some eggs ;-) :) Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list
On 07/25/2012 04:36 PM, Paolo Bonzini wrote: Il 25/07/2012 15:26, Boaz Harrosh ha scritto: On 07/25/2012 03:49 PM, Paolo Bonzini wrote: Except here the destination array has to be given to virtio, which doesn't (yet) understand chaining. I'm using for_each_sg rather than a simple memcpy exactly because I want to flatten the input scatterlist onto consecutive scatterlist entries, which is what virtio expects (and what I'll change when I get to it). for_each_sg guarantees that I get non-chain scatterlists only, so it is okay to value-assign them to sg[]. So if the virtio does not understand chaining at all then surly it will not understand the 2-bit end marker and will get a wrong page pointer with the 1st bit set. It doesn't understand chaining, but it does use sg_phys(x) so it will not get a wrong page pointer for the end marker. Fine then your code is now a crash because the terminating bit was just copied over, which it was not before. I did test the patch with value-assignment. Still you should use the sg_set_page()!! 1. It is not allowed to directly manipulate sg entries. One should always use the proper accessor. Even if open coding does work and is not a bug it should not be used anyway! 2. Future code that will support chaining will need to do as I say so why change it then, again? Please don't change two things in one patch. The fix is for high-pages please fix only that here. You can blasphemy open-code the sg manipulation in a separate patch. Please Boaz Lets separate the two topics from now on. Send me one mail concerning the proper above patch, And a different mail for how to support chaining. Ok, and I'll change the topic. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)
On 07/25/2012 11:06 PM, Paolo Bonzini wrote: Il 25/07/2012 21:16, Boaz Harrosh ha scritto: The picture confused me. It looked like the first element is the virtio_scsi_cmd_req not an sgilist-element that points to the struct's buffer. In that case then yes your plan of making a two-elements fragment that points to the original scsi-sglist is perfect. All you have to do is that, and all you have to do at virtio is use the sg_for_each macro and you are done. You don't need any sglist allocation or reshaping. And you can easily support chaining. Looks like order of magnitude more simple then what you do now It is. So what is the problem? That not all architectures have ARCH_HAS_SG_CHAIN (though all those I care about do). So I need to go through all architectures and make sure they use for_each_sg, or at least to change ARCH_HAS_SG_CHAIN to a Kconfig define so that dependencies can be expressed properly. What is actually preventing ARCH_HAS_SG_CHAIN from all these ARCHES? is that the DMA drivers not using for_each_sg(). Sounds like easy to fix. But yes a deep change would convert ARCH_HAS_SG_CHAIN to a Kconfig. If you want to be lazy, like me, You might just put a BUILD_BUG_ON in code, requesting the user to disable the driver for this ARCH. I bet there is more things to do at ARCH to enable virtualization then just support ARCH_HAS_SG_CHAIN. Be it just another requirement. If you Document it and make sure current ARCHs are fine, it should not ever trigger. And BTW you won't need that new __sg_set_page API anymore. Kind of. sg_init_table(sg, 2); sg_set_buf(sg[0], req, sizeof(req)); sg_chain(sg[1], scsi_out(sc)); is still a little bit worse than __sg_set_buf(sg[0], req, sizeof(req)); __sg_chain(sg[1], scsi_out(sc)); I believe they are the same, specially for the on the stack 2 elements array. Actually I think In both cases you need to at least call sg_init_table() once after allocation, No? Your old code with big array copy and re-shaping was a better example of the need for your new API. Which I agree. But please for my sake do not call it __sg_chain. Call it something like sg_chain_not_end(). I hate those __ which for god sack means what? (A good name is when I don't have to read the code, an __ means fuck you go read the code) Paolo Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gdth new set of patches for 2.6.24 stable
On Sun, Feb 17 2008 at 19:24 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Sun, 2008-02-17 at 18:46 +0200, Boaz Harrosh wrote: On Thu, Feb 14 2008 at 20:47 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: Submitted are a new set of patches, that fix lots of problems with the gdth driver. It fixes the following problems: - scan for drives on hosts. (Already in mainline) - truly fixes the exit/reboot problems but does call flush() before reboot. - fix crash when accessing array with icpcon management application. - fix crash when doing $ cat /proc/sys/gdth/0. This one still has the below WARN_ON in messages (see gdth_info below) So there is one more thing hiding in there. - use pci_get_device One of the testers requested if we can also put the move to pci_get_device patch with removal of dependency on PCI_LEGACY, to the stable release. The patches are for and based on Linux-2.6.24. here is the list of patches: [PATCH 1/5] gdth: update deprecated pci_find_device [PATCH 2/5] gdth: scan for scsi devices [PATCH 3/5] gdth: bugfix for the at-exit problems [PATCH 4/5] gdth: fix to internal commands execution [PATCH 5/5] gdth: remove gdth cooked up command accessors Please all test and report your findings. Thanks in advance Boaz --- gdth_info WARNING: at arch/x86/kernel/pci-dma_32.c:66 dma_free_coherent() Pid: 5501, comm: cat Not tainted 2.6.24 #43 [c0107137] dma_free_coherent+0x93/0x95 [c025ef73] gdth_ioctl_free+0x4c/0x69 [c0264a36] gdth_proc_info+0x165f/0x182c [c0111f7a] update_curr+0xeb/0xf2 [c01132aa] task_rq_lock+0x29/0x50 [c0113706] try_to_wake_up+0x42/0x342 [c0113706] try_to_wake_up+0x42/0x342 [c0111a9f] __wake_up_common+0x46/0x6d [c0113569] __wake_up+0x32/0x42 [c022dad9] n_tty_receive_buf+0x2e8/0xe97 [c022dad9] n_tty_receive_buf+0x2e8/0xe97 [c0111f0a] update_curr+0x7b/0xf2 [c0112625] enqueue_task_fair+0x27/0x30 [c0111783] enqueue_task+0xa/0x14 [c025e351] proc_scsi_read+0x29/0x3d [c025e328] proc_scsi_read+0x0/0x3d [c0189704] proc_file_read+0x1c6/0x279 [c018953e] proc_file_read+0x0/0x279 [c0185eca] proc_reg_read+0x53/0x71 [c0185e77] proc_reg_read+0x0/0x71 [c0159968] vfs_read+0x85/0x11b [c0159d9d] sys_read+0x41/0x6a [c0102822] sysenter_past_esp+0x5f/0x85 /gdth_info - James hi. All my testers have reported back that with these 5 patches applied they can now run with a 2.6.24 kernel the same way they ran before. However there is that reported issue, with the dma_free_coherent WARN_ON (above). The code was like that from day one and it is a very old issue, however it is a regression because 2.6.24 introduced that new WARN_ON. (infamous commit aa24886e379d2b641c5117e178b15ce1d5d366ba) From posts on lkml and even recent one in linux-scsi about the arcmsr driver it looks that all a driver can do is work around it with different kernel mechanisms and driver rewrites. I'm afraid I need your help here. I'm not sure I understand why does the gdth driver uses the pci_{alloc,free}_consistent() API's, and what is needed to replace it. Could you please have a look in gdth_proc.c and also in gdth.c for all the places that call gdth_ioctl_alloc/gdth_ioctl_free, and advise what can I do in it's place. Please bear in mind that we need it for 2.6.24, as a bugfix. Apart from the above issue, please accept patches 3,4,5 above they have now been tested and are reported to bring broken system back to production. (Given that you approve off course). And mark them for inclusion to the 2.6.24 stable releases. (Or is there some thing that I should do) --- Meanwhile on x86 systems I understand the WARN_ON is cosmetic, and does not pose any harm. Some people have reported stability with temporarily disabling it. For testers that want to try, here it is below. At your own risk. Isn't this the correct fix? pscratch is a permanent address (it's allocated at boot time and never changes). All you need the smp lock for is mediating the scratch in use flag. James diff --git a/drivers/scsi/gdth_proc.c b/drivers/scsi/gdth_proc.c index de57734..ce0228e 100644 --- a/drivers/scsi/gdth_proc.c +++ b/drivers/scsi/gdth_proc.c @@ -694,15 +694,13 @@ static void gdth_ioctl_free(gdth_ha_str *ha, int size, char *buf, ulong64 paddr) { ulong flags; -spin_lock_irqsave(ha-smp_lock, flags); - if (buf == ha-pscratch) { + spin_lock_irqsave(ha-smp_lock, flags); ha-scratch_busy = FALSE; + spin_unlock_irqrestore(ha-smp_lock, flags); } else { pci_free_consistent(ha-pdev, size, buf, paddr); } - -spin_unlock_irqrestore(ha-smp_lock, flags); } #ifdef GDTH_IOCTL_PROC - James You are bung on the money. It was tested and it works. So simple, I was thinking it was accessed by DMA and freed at interrupt. But no, just a simple lock like this. So that's it then, all reported
Re: gdth new set of patches for 2.6.24 stable
On Mon, Feb 18 2008 at 14:57 +0200, Andrew Morton [EMAIL PROTECTED] wrote: On Sun, 17 Feb 2008 18:46:03 +0200 Boaz Harrosh [EMAIL PROTECTED] wrote: ... All my testers have reported back that with these 5 patches applied they can now run with a 2.6.24 kernel the same way they ran before. However there is that reported issue, with the dma_free_coherent WARN_ON (above). The code was like that from day one and it is a very old issue, however it is a regression because 2.6.24 introduced that new WARN_ON. (infamous commit aa24886e379d2b641c5117e178b15ce1d5d366ba) From posts on lkml and even recent one in linux-scsi about the arcmsr driver it looks that all a driver can do is work around it with different kernel mechanisms and driver rewrites. I'm afraid I need your help here. I'm not sure I understand why does the gdth driver uses the pci_{alloc,free}_consistent() API's, and what is needed to replace it. Could you please have a look in gdth_proc.c and also in gdth.c for all the places that call gdth_ioctl_alloc/gdth_ioctl_free, and advise what can I do in it's place. Please bear in mind that we need it for 2.6.24, as a bugfix. Apart from the above issue, please accept patches 3,4,5 above they have now been tested and are reported to bring broken system back to production. (Given that you approve off course). And mark them for inclusion to the 2.6.24 stable releases. (Or is there some thing that I should do) --- Meanwhile on x86 systems I understand the WARN_ON is cosmetic, and does not pose any harm. Some people have reported stability with temporarily disabling it. For testers that want to try, here it is below. At your own risk. --- From 50d3657bf6a138ee63ad1ce00052380edc75ace7 Mon Sep 17 00:00:00 2001 From: Boaz Harrosh [EMAIL PROTECTED] Date: Sun, 17 Feb 2008 12:49:35 +0200 Subject: [PATCH] gdth: Hack to remove WARN_ON in arch/x86/kernel/pci-dma_32.c gdth uses dma_free_coherent() with interrupts disabled. Which is not portable, but is safe on the HW that supports gdth. NOT Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- arch/x86/kernel/pci-dma_32.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c index 5133032..350dcfd 100644 --- a/arch/x86/kernel/pci-dma_32.c +++ b/arch/x86/kernel/pci-dma_32.c @@ -63,7 +63,7 @@ void dma_free_coherent(struct device *dev, size_t size, struct dma_coherent_mem *mem = dev ? dev-dma_mem : NULL; int order = get_order(size); -WARN_ON(irqs_disabled()); /* for portability */ +/* WARN_ON(irqs_disabled());*/ /* for portability */ if (mem vaddr = mem-virt_base vaddr (mem-virt_base + (mem-size PAGE_SHIFT))) { int page = (vaddr - mem-virt_base) PAGE_SHIFT; Yes. Let's reprise aa24886e379d2b641c5117e178b15ce1d5d366ba: : commit aa24886e379d2b641c5117e178b15ce1d5d366ba : Author: David Brownell [EMAIL PROTECTED] : Date: Fri Aug 10 13:10:27 2007 -0700 : : dma_free_coherent() needs irqs enabled (sigh) : : On at least ARM (and I'm told MIPS too) dma_free_coherent() has a newish : call context requirement: unlike its dma_alloc_coherent() sibling, it may : not be called with IRQs disabled. (This was new behavior on ARM as of late : 2005, caused by ARM SMP updates.) This little surprise can be annoyingly : driver-visible. : : Since it looks like that restriction won't be removed, this patch changes : the definition of the API to include that requirement. Also, to help catch : nonportable drivers, it updates the x86 and swiotlb versions to include the : relevant warnings. (I already observed that it trips on the : bus_reset_tasklet of the new firewire_ohci driver.) : In general, all Linux memory-freeing functions can be called from all contexts. (vfree is an irritating exception). This is good, and provides maximum usefulness to callees, as all utility functions should seek to do. It would be best to fix arm and mips. But arm and mips require enabled local irqs because their dma_free_coherent() needs to do a cross-cpu IPI call. Presumably because of certain unusual TLB protocols. I'm not sure what we should do about this. Presumably the gdth-on-arm usage base is, umm, zero, so we could lamely add CONFIG_DMA_FREE_COHERENT_WITH_LOCAL_IRQS_DISABLED_IS_OK and then use that to disable gdth (and similar) on arm amd mips. But ugh. Russell, Ralf: is there something we can do here to relax this requirement? I'm thinking that perhaps we can do some rcu/refcounting tricks: launch the IPI from within dma_free_coherent(), but don't wait for it to complete. When all CPUs have handled the IPI then (and only then) the virtual address becomes recyclable, or something like that? double-checks Actually I think David might have been wrong about mips. afaict its
[PATCH 1/3 ver2] iscsi: extended cdb support
Support for extended CDBs in iscsi. All we need is to check if command spills over 16 bytes then allocate an iscsi-extended-header for the leftovers. Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] Reviewed-by: Pete Wyckoff [EMAIL PROTECTED] Signed-off-by: Mike Christie [EMAIL PROTECTED] --- drivers/scsi/libiscsi.c| 55 include/scsi/iscsi_proto.h |6 +++- 2 files changed, 54 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 59f8445..a43b8ee 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -137,6 +137,45 @@ static int iscsi_add_hdr(struct iscsi_cmd_task *ctask, unsigned len) return 0; } +/* + * make an extended cdb AHS + */ +static int iscsi_prep_ecdb_ahs(struct iscsi_cmd_task *ctask) +{ + struct scsi_cmnd *cmd = ctask-sc; + unsigned rlen, pad_len; + unsigned short ahslength; + struct iscsi_ecdb_ahdr *ecdb_ahdr; + int rc; + + ecdb_ahdr = iscsi_next_hdr(ctask); + rlen = cmd-cmd_len - ISCSI_CDB_SIZE; + + BUG_ON(rlen sizeof(ecdb_ahdr-ecdb)); + ahslength = rlen + sizeof(ecdb_ahdr-reserved); + + pad_len = iscsi_padding(rlen); + + rc = iscsi_add_hdr(ctask, sizeof(ecdb_ahdr-ahslength) + + sizeof(ecdb_ahdr-ahstype) + ahslength + pad_len); + if (rc) + return rc; + + if (pad_len) + memset(ecdb_ahdr-ecdb[rlen], 0, pad_len); + + ecdb_ahdr-ahslength = cpu_to_be16(ahslength); + ecdb_ahdr-ahstype = ISCSI_AHSTYPE_CDB; + ecdb_ahdr-reserved = 0; + memcpy(ecdb_ahdr-ecdb, cmd-cmnd + ISCSI_CDB_SIZE, rlen); + + debug_scsi(iscsi_prep_ecdb_ahs: varlen_cdb_len %d + rlen %d pad_len %d ahs_length %d iscsi_headers_size %u\n, + cmd-cmd_len, rlen, pad_len, ahslength, ctask-hdr_len); + + return 0; +} + /** * iscsi_prep_scsi_cmd_pdu - prep iscsi scsi cmd pdu * @ctask: iscsi cmd task @@ -150,7 +189,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask) struct iscsi_session *session = conn-session; struct iscsi_cmd *hdr = ctask-hdr; struct scsi_cmnd *sc = ctask-sc; - unsigned hdrlength; + unsigned hdrlength, cmd_len; int rc; ctask-hdr_len = 0; @@ -165,10 +204,16 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask) hdr-cmdsn = cpu_to_be32(session-cmdsn); session-cmdsn++; hdr-exp_statsn = cpu_to_be32(conn-exp_statsn); - memcpy(hdr-cdb, sc-cmnd, sc-cmd_len); - if (sc-cmd_len MAX_COMMAND_SIZE) - memset(hdr-cdb[sc-cmd_len], 0, - MAX_COMMAND_SIZE - sc-cmd_len); + cmd_len = sc-cmd_len; + if (cmd_len ISCSI_CDB_SIZE) + memset(hdr-cdb[cmd_len], 0, ISCSI_CDB_SIZE - cmd_len); + else if (cmd_len ISCSI_CDB_SIZE) { + rc = iscsi_prep_ecdb_ahs(ctask); + if (rc) + return rc; + cmd_len = ISCSI_CDB_SIZE; + } + memcpy(hdr-cdb, sc-cmnd, cmd_len); ctask-imm_count = 0; if (sc-sc_data_direction == DMA_TO_DEVICE) { diff --git a/include/scsi/iscsi_proto.h b/include/scsi/iscsi_proto.h index 5ffec8a..e0593bf 100644 --- a/include/scsi/iscsi_proto.h +++ b/include/scsi/iscsi_proto.h @@ -112,6 +112,7 @@ struct iscsi_ahs_hdr { #define ISCSI_AHSTYPE_CDB 1 #define ISCSI_AHSTYPE_RLENGTH 2 +#define ISCSI_CDB_SIZE 16 /* iSCSI PDU Header */ struct iscsi_cmd { @@ -125,7 +126,7 @@ struct iscsi_cmd { __be32 data_length; __be32 cmdsn; __be32 exp_statsn; - uint8_t cdb[16];/* SCSI Command Block */ + uint8_t cdb[ISCSI_CDB_SIZE];/* SCSI Command Block */ /* Additional Data (Command Dependent) */ }; @@ -154,7 +155,8 @@ struct iscsi_ecdb_ahdr { __be16 ahslength; /* CDB length - 15, including reserved byte */ uint8_t ahstype; uint8_t reserved; - uint8_t ecdb[260 - 16]; /* 4-byte aligned extended CDB spillover */ + /* 4-byte aligned extended CDB spillover */ + uint8_t ecdb[260 - ISCSI_CDB_SIZE]; }; /* SCSI Response Header */ -- 1.5.3.3 - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3 ver2] iscsi bidi varlen support
On Thu, Jan 31 2008 at 20:08 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: Cheers after 1.3 years these can go in. [PATCH 1/3] iscsi: extended cdb support The varlen support is not yet in mainline for block and scsi-ml. But the API for drivers will not change. All LLD need to do is max_command to the it's maximum and be ready for bigger commands. This is what's done here. Once these commands start coming iscsi will be ready for them. [PATCH 2/3] iscsi: bidi support - libiscsi [PATCH 3/3] iscsi: bidi support - iscsi_tcp bidirectional commands support in iscsi. iSER is not yet ready, but it will not break. There is already a mechanism in libiscsi that will return error if bidi commands are sent iSER way. Pete please send me the iSER bits so we can port them to this latest version. Mike these patches are ontop of iscs branch of the iscsi git tree, they will apply but for compilation you will need to sync with Linus mainline. The patches are for the in-tree iscsi code. I own you the compat patch for the out-off-tree code, but this I will only be Sunday. If we do it fast it might get accepted to 2.6.25 merge window Everybody is invited to a party at Shila ben-yhuda 52 Tel-Aviv 9:45 pm. Drinks and wonderful see-food on us :) Boaz - Everything the same as before. But working this time. Also Pete's comment about second patch, was correct and code is now fixed. I have got Mike's Signed-off-by, on these they were tested and approved by him. So they are for scsi-misc. But ... James? is there any chance these can go into scsi-rc-fixes for the 2.6.25 kernel? The reason they are so late was mainly because of a fallout in the merge process and a bug that was introduced because of that, but they were intended to go together with bidi into 2.6.25. Also as an important client code to the bidi-api that is introduced in 2.6.25 kernel. Thanks Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] iscsi bidi varlen support
On Tue, Feb 12 2008 at 22:17 +0200, Pete Wyckoff [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote on Tue, 12 Feb 2008 15:12 -0500: [EMAIL PROTECTED] wrote on Thu, 31 Jan 2008 20:08 +0200: Cheers after 1.3 years these can go in. [PATCH 1/3] iscsi: extended cdb support The varlen support is not yet in mainline for block and scsi-ml. But the API for drivers will not change. All LLD need to do is max_command to the it's maximum and be ready for bigger commands. This is what's done here. Once these commands start coming iscsi will be ready for them. [PATCH 2/3] iscsi: bidi support - libiscsi [PATCH 3/3] iscsi: bidi support - iscsi_tcp bidirectional commands support in iscsi. iSER is not yet ready, but it will not break. There is already a mechanism in libiscsi that will return error if bidi commands are sent iSER way. Pete please send me the iSER bits so we can port them to this latest version. Mike these patches are ontop of iscs branch of the iscsi git tree, they will apply but for compilation you will need to sync with Linus mainline. The patches are for the in-tree iscsi code. I own you the compat patch for the out-off-tree code, but this I will only be Sunday. Here's the patch to add bidi support to iSER too. It works with my setup, but could use more testing. Note that this does rely on the 3 patches quoted above. Similar, for varlen support to iSER. Probably apply this one before the bidi one I just sent. -- Pete From: Pete Wyckoff [EMAIL PROTECTED] Subject: [PATCH] iscsi iser: varlen Handle variable-length CDBs in iSER. Signed-off-by: Pete Wyckoff [EMAIL PROTECTED] --- drivers/infiniband/ulp/iser/iscsi_iser.c |5 +++-- drivers/infiniband/ulp/iser/iscsi_iser.h |2 +- drivers/infiniband/ulp/iser/iser_initiator.c | 16 ++-- 3 files changed, 14 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 5f2284d..9dfc310 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -401,7 +401,8 @@ iscsi_iser_session_create(struct iscsi_transport *iscsit, ctask = session-cmds[i]; iser_ctask = ctask-dd_data; ctask-hdr = (struct iscsi_cmd *)iser_ctask-desc.iscsi_header; - ctask-hdr_max = sizeof(iser_ctask-desc.iscsi_header); + ctask-hdr_max = sizeof(iser_ctask-desc.iscsi_header) + + sizeof(iser_ctask-desc.hdrextbuf); } for (i = 0; i session-mgmtpool_max; i++) { @@ -604,7 +605,7 @@ static struct iscsi_transport iscsi_iser_transport = { .host_template = iscsi_iser_sht, .conndata_size = sizeof(struct iscsi_conn), .max_lun= ISCSI_ISER_MAX_LUN, - .max_cmd_len= ISCSI_ISER_MAX_CMD_LEN, + .max_cmd_len= 260, Same bug I had. .max_cmd_len is still char, before the varlen patch to scsi-ml. So it must be at most 252, Until that patch is introduced and it can return to the correct 260 or better yet SCSI_MAX_VARLEN_CDB_SIZE. That also is only defined in the scsi-ml varlen patch. /* session management */ .create_session = iscsi_iser_session_create, .destroy_session= iscsi_session_teardown, diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index db8f81a..66905df 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -90,7 +90,6 @@ /* FMR space for 1 MB of 4k-page transfers, plus 1 if not page aligned */ #define ISCSI_ISER_SG_TABLESIZE (((120) SHIFT_4K) + 1) #define ISCSI_ISER_MAX_LUN 256 -#define ISCSI_ISER_MAX_CMD_LEN 16 /* QP settings */ /* Maximal bounds on received asynchronous PDUs */ @@ -217,6 +216,7 @@ enum iser_desc_type { struct iser_desc { struct iser_hdr iser_header; struct iscsi_hdr iscsi_header; + char hdrextbuf[ISCSI_MAX_AHS_SIZE]; struct iser_regd_buf hdr_regd_buf; void *data; /* used by RX TX_CONTROL */ struct iser_regd_buf data_regd_buf; /* used by RX TX_CONTROL */ diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 83247f1..ea3f5dc 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -246,9 +246,13 @@ post_rx_kmalloc_failure: return err; } -/* creates a new tx descriptor and adds header regd buffer */ +/** + * iser_create_send_desc - creates a new tx descriptor and adds header regd buffer + * @iscsi_hdr_len: length of struct iscsi_hdr and any AHSes in hdrextbuf + */ static void
[PATCH 3/3 ver2] iscsi: bidi support - iscsi_tcp
bidi support for iscsi_tcp - access the right scsi_in() and/or scsi_out() side of things. also for resid Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] Reviewed-by: Pete Wyckoff [EMAIL PROTECTED] Signed-off-by: Mike Christie [EMAIL PROTECTED] --- drivers/scsi/iscsi_tcp.c | 31 +-- 1 files changed, 17 insertions(+), 14 deletions(-) diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index 8a17867..72b9b2a 100644 --- a/drivers/scsi/iscsi_tcp.c +++ b/drivers/scsi/iscsi_tcp.c @@ -528,6 +528,7 @@ iscsi_data_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) struct iscsi_session *session = conn-session; struct scsi_cmnd *sc = ctask-sc; int datasn = be32_to_cpu(rhdr-datasn); + unsigned total_in_length = scsi_in(sc)-length; iscsi_update_cmdsn(session, (struct iscsi_nopin*)rhdr); if (tcp_conn-in.datalen == 0) @@ -542,10 +543,10 @@ iscsi_data_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) tcp_ctask-exp_datasn++; tcp_ctask-data_offset = be32_to_cpu(rhdr-offset); - if (tcp_ctask-data_offset + tcp_conn-in.datalen scsi_bufflen(sc)) { + if (tcp_ctask-data_offset + tcp_conn-in.datalen total_in_length) { debug_tcp(%s: data_offset(%d) + data_len(%d) total_length_in(%d)\n, __FUNCTION__, tcp_ctask-data_offset, - tcp_conn-in.datalen, scsi_bufflen(sc)); + tcp_conn-in.datalen, total_in_length); return ISCSI_ERR_DATA_OFFSET; } @@ -558,8 +559,8 @@ iscsi_data_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) if (res_count 0 (rhdr-flags ISCSI_FLAG_CMD_OVERFLOW || -res_count = scsi_bufflen(sc))) - scsi_set_resid(sc, res_count); +res_count = total_in_length)) + scsi_in(sc)-resid = res_count; else sc-result = (DID_BAD_TARGET 16) | rhdr-cmd_status; @@ -670,11 +671,11 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) r2t-data_length, session-max_burst); r2t-data_offset = be32_to_cpu(rhdr-data_offset); - if (r2t-data_offset + r2t-data_length scsi_bufflen(ctask-sc)) { + if (r2t-data_offset + r2t-data_length scsi_out(ctask-sc)-length) { iscsi_conn_printk(KERN_ERR, conn, invalid R2T with data len %u at offset %u and total length %d\n, r2t-data_length, - r2t-data_offset, scsi_bufflen(ctask-sc)); + r2t-data_offset, scsi_out(ctask-sc)-length); __kfifo_put(tcp_ctask-r2tpool.queue, (void*)r2t, sizeof(void*)); return ISCSI_ERR_DATALEN; @@ -771,6 +772,7 @@ iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct iscsi_hdr *hdr) if (tcp_conn-in.datalen) { struct iscsi_tcp_cmd_task *tcp_ctask = ctask-dd_data; struct hash_desc *rx_hash = NULL; + struct scsi_data_buffer *sdb = scsi_in(ctask-sc); /* * Setup copy of Data-In into the Scsi_Cmnd @@ -788,8 +790,8 @@ iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct iscsi_hdr *hdr) tcp_ctask-data_offset, tcp_conn-in.datalen); return iscsi_segment_seek_sg(tcp_conn-in.segment, -scsi_sglist(ctask-sc), -scsi_sg_count(ctask-sc), +sdb-table.sgl, +sdb-table.nents, tcp_ctask-data_offset, tcp_conn-in.datalen, iscsi_tcp_process_data_in, @@ -1332,7 +1334,8 @@ iscsi_tcp_ctask_init(struct iscsi_cmd_task *ctask) return 0; /* If we have immediate data, attach a payload */ - err = iscsi_tcp_send_data_prep(conn, scsi_sglist(sc), scsi_sg_count(sc), + err = iscsi_tcp_send_data_prep(conn, scsi_out(sc)-table.sgl, + scsi_out(sc)-table.nents, 0, ctask-imm_count); if (err) return err; @@ -1386,6 +1389,7 @@ iscsi_tcp_ctask_xmit(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) { struct iscsi_tcp_cmd_task *tcp_ctask = ctask-dd_data; struct scsi_cmnd *sc = ctask-sc
[PATCH 2/3 ver2] iscsi: bidi support - libiscsi
iscsi bidi support at the generic libiscsi level - prepare the additional bidi_read rlength header. - access the right scsi_in() and/or scsi_out() side of things. also for resid. - Handle BIDI underflow overflow from target Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] Reviewed-by: Pete Wyckoff [EMAIL PROTECTED] Signed-off-by: Mike Christie [EMAIL PROTECTED] --- drivers/scsi/libiscsi.c | 85 ++ 1 files changed, 70 insertions(+), 15 deletions(-) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index a43b8ee..9c12915 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -176,6 +176,31 @@ static int iscsi_prep_ecdb_ahs(struct iscsi_cmd_task *ctask) return 0; } +static int iscsi_prep_bidi_ahs(struct iscsi_cmd_task *ctask) +{ + struct scsi_cmnd *sc = ctask-sc; + struct iscsi_rlength_ahdr *rlen_ahdr; + int rc; + + rlen_ahdr = iscsi_next_hdr(ctask); + rc = iscsi_add_hdr(ctask, sizeof(*rlen_ahdr)); + if (rc) + return rc; + + rlen_ahdr-ahslength = + cpu_to_be16(sizeof(rlen_ahdr-read_length) + + sizeof(rlen_ahdr-reserved)); + rlen_ahdr-ahstype = ISCSI_AHSTYPE_RLENGTH; + rlen_ahdr-reserved = 0; + rlen_ahdr-read_length = cpu_to_be32(scsi_in(sc)-length); + + debug_scsi(bidi-in rlen_ahdr-read_length(%d) + rlen_ahdr-ahslength(%d)\n, + be32_to_cpu(rlen_ahdr-read_length), + be16_to_cpu(rlen_ahdr-ahslength)); + return 0; +} + /** * iscsi_prep_scsi_cmd_pdu - prep iscsi scsi cmd pdu * @ctask: iscsi cmd task @@ -200,7 +225,6 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask) hdr-flags = ISCSI_ATTR_SIMPLE; int_to_scsilun(sc-device-lun, (struct scsi_lun *)hdr-lun); hdr-itt = build_itt(ctask-itt, session-age); - hdr-data_length = cpu_to_be32(scsi_bufflen(sc)); hdr-cmdsn = cpu_to_be32(session-cmdsn); session-cmdsn++; hdr-exp_statsn = cpu_to_be32(conn-exp_statsn); @@ -216,7 +240,15 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask) memcpy(hdr-cdb, sc-cmnd, cmd_len); ctask-imm_count = 0; + if (scsi_bidi_cmnd(sc)) { + hdr-flags |= ISCSI_FLAG_CMD_READ; + rc = iscsi_prep_bidi_ahs(ctask); + if (rc) + return rc; + } if (sc-sc_data_direction == DMA_TO_DEVICE) { + unsigned out_len = scsi_out(sc)-length; + hdr-data_length = cpu_to_be32(out_len); hdr-flags |= ISCSI_FLAG_CMD_WRITE; /* * Write counters: @@ -237,19 +269,19 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask) ctask-unsol_datasn = 0; if (session-imm_data_en) { - if (scsi_bufflen(sc) = session-first_burst) + if (out_len = session-first_burst) ctask-imm_count = min(session-first_burst, conn-max_xmit_dlength); else - ctask-imm_count = min(scsi_bufflen(sc), + ctask-imm_count = min(out_len, conn-max_xmit_dlength); hton24(hdr-dlength, ctask-imm_count); } else zero_data(hdr-dlength); if (!session-initial_r2t_en) { - ctask-unsol_count = min((session-first_burst), - (scsi_bufflen(sc))) - ctask-imm_count; + ctask-unsol_count = min(session-first_burst, out_len) +- ctask-imm_count; ctask-unsol_offset = ctask-imm_count; } @@ -259,6 +291,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask) } else { hdr-flags |= ISCSI_FLAG_CMD_FINAL; zero_data(hdr-dlength); + hdr-data_length = cpu_to_be32(scsi_in(sc)-length); if (sc-sc_data_direction == DMA_FROM_DEVICE) hdr-flags |= ISCSI_FLAG_CMD_READ; @@ -277,10 +310,12 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask) return EIO; conn-scsicmd_pdus_cnt++; - debug_scsi(iscsi prep [%s cid %d sc %p cdb 0x%x itt 0x%x len %d - cmdsn %d win %d]\n, - sc-sc_data_direction == DMA_TO_DEVICE ? write : read, - conn-id, sc, sc-cmnd[0], ctask-itt, scsi_bufflen(sc), + debug_scsi(iscsi prep [%s cid %d sc %p cdb 0x%x itt 0x%x + len %d bidi_len %d cmdsn %d win %d]\n, + scsi_bidi_cmnd(sc
Re: [PATCH] 2.6.25-rc1-git2: GDT SCSI: change drivers/scsi/gdth.c into using pci_get device
On Sat, Feb 16 2008 at 18:37 +0200, Matthew Wilcox [EMAIL PROTECTED] wrote: On Wed, Feb 13, 2008 at 10:57:37AM +0200, Boaz Harrosh wrote: I still don't have a card for testing myself. Again anyone wants to send me a card. Intel people anybody home? Apparently Intel sold this line of cards to Adaptec. The copyright notice in the file backs this up: * Copyright (C) 1995-06 ICP vortex GmbH, Achim Leubner * * Copyright (C) 2002-04 Intel Corporation * * Copyright (C) 2003-06 Adaptec Inc. * * [EMAIL PROTECTED] * OK I never got this guy to ping back. CCing [EMAIL PROTECTED] Who is the right contact person, regarding the HW that is supported by the gdth driver? Form what we see in driver commit logs (above), it was transfered to Adaptec from Intel in 2003. Is that still so? Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ips.c broken since 2.6.23 on x86_64?
On Sat, Feb 16 2008 at 2:41 +0200, FUJITA Tomonori [EMAIL PROTECTED] wrote: On Fri, 15 Feb 2008 14:50:57 -0800 Tim Pepper [EMAIL PROTECTED] wrote: On Sat 16 Feb at 01:09:43 +0900 [EMAIL PROTECTED] said: The first one is just reverting the data buffer accessors conversion. It would be nice if we could just revert it but we can't. These changes are necessary to compile the driver against post 2.6.24. Fujita-san, Unfortunately (and not too surprisingly given what we've tried so far) with only the first of your series reverted the driver is working fine for me again. Do you mean that you applied only the following two patches against 2.6.24, and then it doesn't work? 0001-ips-revert-the-changes-for-the-data-buffer-accessor.patch 0002-ips-kill-the-map_single-path-in-ips_scmd_buf_write.patch If so, the second patch is broken. Did you saw BUG_ON message (I added some BUG_ON to the patch)? I saw (eg: replies to http://lkml.org/lkml/2007/5/11/132) some possibly similar sounding issues with other drivers. Could there be some memory uninitialised? I did try changing all the ips.c kmalloc's to kzalloc's, but that didn't help. Also that thread ties into pci gart. The machines we've been using are liable to getting pci calgary although given my .config has: CONFIG_GART_IOMMU=y CONFIG_CALGARY_IOMMU=y # CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set and that when booting this mainline I don't see any Calgary related messages like I get from eg: Ubuntu's 2.6.22-14-server...I'm probably not actually running the calgary iommu code in these repros. Yes, probabaly, your machine doesn't use any IOMMU hardware (nommu_map_sg function was in your crash log). Anyway, I greatly appreciate your efforts so far in trying to find what could be wrong here! Really sorry about the troubles and thanks for testing. - Tomo hi It looks like the same bug we had with USB's isd200 and protocol.c. An overflow of a data buffer bigger then then the sglist. There 2 it was in the INQUIRY command. You just need to also check for sg != NULL in the for() loop. Tim Please test below patch. It's ontop of 2.6.24 but should also apply to 2.6.25-rcx Boaz -- From ec20bea25c9fe2400378b19c128b15fef3c7cbb6 Mon Sep 17 00:00:00 2001 From: Boaz Harrosh [EMAIL PROTECTED] Date: Sun, 17 Feb 2008 14:50:25 +0200 Subject: [PATCH] ips: Avoid overflow in writing scsi command data Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/ips.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c index 5c5a9b2..1d12253 100644 --- a/drivers/scsi/ips.c +++ b/drivers/scsi/ips.c @@ -3517,7 +3517,7 @@ ips_scmd_buf_write(struct scsi_cmnd *scmd, void *data, unsigned int count) struct scatterlist *sg = scsi_sglist(scmd); for (i = 0, xfer_cnt = 0; - (i scsi_sg_count(scmd)) (xfer_cnt count); i++) { + (i scsi_sg_count(scmd)) (xfer_cnt count) sg; i++) { min_cnt = min(count - xfer_cnt, sg[i].length); /* kmap_atomic() ensures addressability of the data buffer.*/ -- 1.5.3.3 - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ips.c broken since 2.6.23 on x86_64?
On Sun, Feb 17 2008 at 14:52 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Sat, Feb 16 2008 at 2:41 +0200, FUJITA Tomonori [EMAIL PROTECTED] wrote: On Fri, 15 Feb 2008 14:50:57 -0800 Tim Pepper [EMAIL PROTECTED] wrote: On Sat 16 Feb at 01:09:43 +0900 [EMAIL PROTECTED] said: The first one is just reverting the data buffer accessors conversion. It would be nice if we could just revert it but we can't. These changes are necessary to compile the driver against post 2.6.24. Fujita-san, Unfortunately (and not too surprisingly given what we've tried so far) with only the first of your series reverted the driver is working fine for me again. Do you mean that you applied only the following two patches against 2.6.24, and then it doesn't work? 0001-ips-revert-the-changes-for-the-data-buffer-accessor.patch 0002-ips-kill-the-map_single-path-in-ips_scmd_buf_write.patch If so, the second patch is broken. Did you saw BUG_ON message (I added some BUG_ON to the patch)? I saw (eg: replies to http://lkml.org/lkml/2007/5/11/132) some possibly similar sounding issues with other drivers. Could there be some memory uninitialised? I did try changing all the ips.c kmalloc's to kzalloc's, but that didn't help. Also that thread ties into pci gart. The machines we've been using are liable to getting pci calgary although given my .config has: CONFIG_GART_IOMMU=y CONFIG_CALGARY_IOMMU=y # CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set and that when booting this mainline I don't see any Calgary related messages like I get from eg: Ubuntu's 2.6.22-14-server...I'm probably not actually running the calgary iommu code in these repros. Yes, probabaly, your machine doesn't use any IOMMU hardware (nommu_map_sg function was in your crash log). Anyway, I greatly appreciate your efforts so far in trying to find what could be wrong here! Really sorry about the troubles and thanks for testing. - Tomo hi It looks like the same bug we had with USB's isd200 and protocol.c. An overflow of a data buffer bigger then then the sglist. There 2 it was in the INQUIRY command. You just need to also check for sg != NULL in the for() loop. Tim Please test below patch. It's ontop of 2.6.24 but should also apply to 2.6.25-rcx Boaz -- From ec20bea25c9fe2400378b19c128b15fef3c7cbb6 Mon Sep 17 00:00:00 2001 From: Boaz Harrosh [EMAIL PROTECTED] Date: Sun, 17 Feb 2008 14:50:25 +0200 Subject: [PATCH] ips: Avoid overflow in writing scsi command data Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/ips.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c index 5c5a9b2..1d12253 100644 --- a/drivers/scsi/ips.c +++ b/drivers/scsi/ips.c @@ -3517,7 +3517,7 @@ ips_scmd_buf_write(struct scsi_cmnd *scmd, void *data, unsigned int count) Disregard that patch it is totally wrong. That's not it. Sorry Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gdth new set of patches for 2.6.24 stable
On Thu, Feb 14 2008 at 20:47 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: Submitted are a new set of patches, that fix lots of problems with the gdth driver. It fixes the following problems: - scan for drives on hosts. (Already in mainline) - truly fixes the exit/reboot problems but does call flush() before reboot. - fix crash when accessing array with icpcon management application. - fix crash when doing $ cat /proc/sys/gdth/0. This one still has the below WARN_ON in messages (see gdth_info below) So there is one more thing hiding in there. - use pci_get_device One of the testers requested if we can also put the move to pci_get_device patch with removal of dependency on PCI_LEGACY, to the stable release. The patches are for and based on Linux-2.6.24. here is the list of patches: [PATCH 1/5] gdth: update deprecated pci_find_device [PATCH 2/5] gdth: scan for scsi devices [PATCH 3/5] gdth: bugfix for the at-exit problems [PATCH 4/5] gdth: fix to internal commands execution [PATCH 5/5] gdth: remove gdth cooked up command accessors Please all test and report your findings. Thanks in advance Boaz --- gdth_info WARNING: at arch/x86/kernel/pci-dma_32.c:66 dma_free_coherent() Pid: 5501, comm: cat Not tainted 2.6.24 #43 [c0107137] dma_free_coherent+0x93/0x95 [c025ef73] gdth_ioctl_free+0x4c/0x69 [c0264a36] gdth_proc_info+0x165f/0x182c [c0111f7a] update_curr+0xeb/0xf2 [c01132aa] task_rq_lock+0x29/0x50 [c0113706] try_to_wake_up+0x42/0x342 [c0113706] try_to_wake_up+0x42/0x342 [c0111a9f] __wake_up_common+0x46/0x6d [c0113569] __wake_up+0x32/0x42 [c022dad9] n_tty_receive_buf+0x2e8/0xe97 [c022dad9] n_tty_receive_buf+0x2e8/0xe97 [c0111f0a] update_curr+0x7b/0xf2 [c0112625] enqueue_task_fair+0x27/0x30 [c0111783] enqueue_task+0xa/0x14 [c025e351] proc_scsi_read+0x29/0x3d [c025e328] proc_scsi_read+0x0/0x3d [c0189704] proc_file_read+0x1c6/0x279 [c018953e] proc_file_read+0x0/0x279 [c0185eca] proc_reg_read+0x53/0x71 [c0185e77] proc_reg_read+0x0/0x71 [c0159968] vfs_read+0x85/0x11b [c0159d9d] sys_read+0x41/0x6a [c0102822] sysenter_past_esp+0x5f/0x85 /gdth_info - James hi. All my testers have reported back that with these 5 patches applied they can now run with a 2.6.24 kernel the same way they ran before. However there is that reported issue, with the dma_free_coherent WARN_ON (above). The code was like that from day one and it is a very old issue, however it is a regression because 2.6.24 introduced that new WARN_ON. (infamous commit aa24886e379d2b641c5117e178b15ce1d5d366ba) From posts on lkml and even recent one in linux-scsi about the arcmsr driver it looks that all a driver can do is work around it with different kernel mechanisms and driver rewrites. I'm afraid I need your help here. I'm not sure I understand why does the gdth driver uses the pci_{alloc,free}_consistent() API's, and what is needed to replace it. Could you please have a look in gdth_proc.c and also in gdth.c for all the places that call gdth_ioctl_alloc/gdth_ioctl_free, and advise what can I do in it's place. Please bear in mind that we need it for 2.6.24, as a bugfix. Apart from the above issue, please accept patches 3,4,5 above they have now been tested and are reported to bring broken system back to production. (Given that you approve off course). And mark them for inclusion to the 2.6.24 stable releases. (Or is there some thing that I should do) --- Meanwhile on x86 systems I understand the WARN_ON is cosmetic, and does not pose any harm. Some people have reported stability with temporarily disabling it. For testers that want to try, here it is below. At your own risk. --- From 50d3657bf6a138ee63ad1ce00052380edc75ace7 Mon Sep 17 00:00:00 2001 From: Boaz Harrosh [EMAIL PROTECTED] Date: Sun, 17 Feb 2008 12:49:35 +0200 Subject: [PATCH] gdth: Hack to remove WARN_ON in arch/x86/kernel/pci-dma_32.c gdth uses dma_free_coherent() with interrupts disabled. Which is not portable, but is safe on the HW that supports gdth. NOT Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- arch/x86/kernel/pci-dma_32.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c index 5133032..350dcfd 100644 --- a/arch/x86/kernel/pci-dma_32.c +++ b/arch/x86/kernel/pci-dma_32.c @@ -63,7 +63,7 @@ void dma_free_coherent(struct device *dev, size_t size, struct dma_coherent_mem *mem = dev ? dev-dma_mem : NULL; int order = get_order(size); - WARN_ON(irqs_disabled()); /* for portability */ +/* WARN_ON(irqs_disabled());*/ /* for portability */ if (mem vaddr = mem-virt_base vaddr (mem-virt_base + (mem-size PAGE_SHIFT))) { int page = (vaddr - mem-virt_base) PAGE_SHIFT; -- 1.5.3.3 - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message
Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
On Wed, Feb 13 2008 at 19:36 +0200, James Bottomley [EMAIL PROTECTED] wrote: snip --- From: Boaz Harrosh [EMAIL PROTECTED] Subject: [PATCH] gdth: bugfix for the at-exit problems gdth_exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. So del_timer_sync the timer before we delete the cards. also the reboot notifier function would crash. So unify the exit and halt functions with a gdth_shutdown() that's called by both. Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- snip +static struct notifier_block gdth_notifier = { +gdth_halt, NULL, 0 +}; + +bool gdth_shutdown_done; right forgot the static. But I use it in gdth_init(), so it must be external. Unless you promise me that gdth_init() will never ever be called after a call to shutdown. Any way the hot-plug patch changes all that. This is only for 2.6.24 bugfixs. Static police alert! Just make it static and move it into gdth_shutdown() +static void gdth_shutdown() +{ +gdth_ha_str *ha; +if (gdth_shutdown_done) +return; + +gdth_shutdown_done = true; +unregister_chrdev(major,gdth); +unregister_reboot_notifier(gdth_notifier); I'm not sure you can do this, aren't reboot notifiers called with the rwsem held? In which case the unregister which also takes the rwsem will hang the system. humm, can't remove a notifier from within the notifier. Thanks James for the catch, it's what happens when you don't test your own patches. I have moved unregister_reboot_notifier to gdth_exit. James Will send a new version for review. Please note that this is a bugfix patch on top of 2.6.24. It is not needed for Jeff's hot-plug path. There will be one more bugfix patch for a crash at the user-mode ioctl code. Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] gdth: bugfix for the at-exit problems
This is a bugfix for the 2.6.24.x stable releases. gdth_exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. So del_timer_sync the timer before we delete the cards. also the reboot notifier function would crash. So unify the exit and halt functions with a gdth_shutdown() that's called by both. Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/gdth.c | 90 -- 1 files changed, 36 insertions(+), 54 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index 8eb78be..3828b23 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep, unsigned int cmd, unsigned long arg); static void gdth_flush(gdth_ha_str *ha); -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf); static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *)); static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp, struct gdth_cmndinfo *cmndinfo); @@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd, #include gdth_proc.h #include gdth_proc.c -/* notifier block to get a notify on system shutdown/halt/reboot */ -static struct notifier_block gdth_notifier = { -gdth_halt, NULL, 0 -}; -static int notifier_disabled = 0; - static gdth_ha_str *gdth_find_ha(int hanum) { gdth_ha_str *ha; @@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data) gdth_ha_str *ha; ulong flags; +BUG_ON(list_empty(gdth_instances)); + ha = list_first_entry(gdth_instances, gdth_ha_str, list); spin_lock_irqsave(ha-smp_lock, flags); @@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha) } } -/* shutdown routine */ -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf) -{ -gdth_ha_str *ha; -#ifndef __alpha__ -gdth_cmd_strgdtcmd; -charcmnd[MAX_COMMAND_SIZE]; -#endif - -if (notifier_disabled) -return NOTIFY_OK; - -TRACE2((gdth_halt() event %d\n,(int)event)); -if (event != SYS_RESTART event != SYS_HALT event != SYS_POWER_OFF) -return NOTIFY_DONE; - -notifier_disabled = 1; -printk(GDT-HA: Flushing all host drives .. ); -list_for_each_entry(ha, gdth_instances, list) { -gdth_flush(ha); - -#ifndef __alpha__ -/* controller reset */ -memset(cmnd, 0xff, MAX_COMMAND_SIZE); -gdtcmd.BoardNode = LOCALBOARD; -gdtcmd.Service = CACHESERVICE; -gdtcmd.OpCode = GDT_RESET; -TRACE2((gdth_halt(): reset controller %d\n, ha-hanum)); -gdth_execute(ha-shost, gdtcmd, cmnd, 10, NULL); -#endif -} -printk(Done.\n); - -#ifdef GDTH_STATISTICS -del_timer(gdth_timer); -#endif -return NOTIFY_OK; -} - /* configure lun */ static int gdth_slave_configure(struct scsi_device *sdev) { @@ -5141,13 +5097,13 @@ static void gdth_remove_one(gdth_ha_str *ha) scsi_remove_host(shp); + gdth_flush(ha); + if (ha-sdev) { scsi_free_host_dev(ha-sdev); ha-sdev = NULL; } - gdth_flush(ha); - if (shp-irq) free_irq(shp-irq,ha); @@ -5173,6 +5129,22 @@ static void gdth_remove_one(gdth_ha_str *ha) scsi_host_put(shp); } +static void gdth_shutdown(void); +static int gdth_halt(struct notifier_block *nb, ulong event, void *buf) +{ + TRACE2((gdth_halt() event %d\n, (int)event)); + if (event != SYS_RESTART event != SYS_HALT event != SYS_POWER_OFF) + return NOTIFY_DONE; + + gdth_shutdown(); + return NOTIFY_OK; +} + +static struct notifier_block gdth_notifier = { +gdth_halt, NULL, 0 +}; +static bool gdth_shutdown_done; + static int __init gdth_init(void) { if (disable) { @@ -5185,6 +5157,7 @@ static int __init gdth_init(void) GDTH_VERSION_STR); /* initializations */ + gdth_shutdown_done = false; gdth_polling = TRUE; gdth_clear_events(); @@ -5235,23 +5208,32 @@ static int __init gdth_init(void) add_timer(gdth_timer); #endif major = register_chrdev(0,gdth, gdth_fops); - notifier_disabled = 0; register_reboot_notifier(gdth_notifier); gdth_polling = FALSE; return 0; } -static void __exit gdth_exit(void) +static void gdth_shutdown() { gdth_ha_str *ha; - list_for_each_entry(ha, gdth_instances, list) - gdth_remove_one(ha); + if (gdth_shutdown_done) + return; + + gdth_shutdown_done = true; + unregister_chrdev(major, gdth); #ifdef GDTH_STATISTICS - del_timer(gdth_timer); + del_timer_sync(gdth_timer); #endif - unregister_chrdev(major,gdth); + + list_for_each_entry(ha, gdth_instances
Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
On Wed, Feb 13 2008 at 21:38 +0200, Jan Engelhardt [EMAIL PROTECTED] wrote: On Feb 13 2008 11:03, Boaz Harrosh wrote: I've tested this patch now - and it works fine. Now rmmod, halt and reboot also works. Stefan Priebe This is grate news Stefan. Thank you very much for all your time and effort, with out we could not have fixed all this. Do you have a git tree with the latest pieces? No, scsi-misc I guess ;) I could put it here: git://git.bhalevy.com/open-osd gdth branch give me an hours Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] gdth: bugfix for the at-exit problems
On Thu, Feb 14 2008 at 18:10 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Thu, 2008-02-14 at 13:58 +0200, Boaz Harrosh wrote: This is a bugfix for the 2.6.24.x stable releases. gdth_exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. So del_timer_sync the timer before we delete the cards. also the reboot notifier function would crash. So unify the exit and halt functions with a gdth_shutdown() that's called by both. The patch looks fine now, thanks. Can we actually get a tester just to make sure there's nothing I missed. James Yes, and the tester reported, a breakage. We are on it. Apparently, you cannot do a full deallocation of resources at reboot notifier, nor would you want to I guess. But you can do the flush. The exit call is never called on a reboot and the card access is valid to the end. Please comment? So I pretty much reverted that patch, but did leave some cleanups. Also we found the other problems reported with user-mode tools and cat /proc/sys/gdth/0 so 2 patches on the way above reverted. Give us a few ours to test every thing. Thanks Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
gdth new set of patches for 2.6.24 stable
Submitted are a new set of patches, that fix lots of problems with the gdth driver. It fixes the following problems: - scan for drives on hosts. (Already in mainline) - truly fixes the exit/reboot problems but does call flush() before reboot. - fix crash when accessing array with icpcon management application. - fix crash when doing $ cat /proc/sys/gdth/0. This one still has the below WARN_ON in messages (see gdth_info below) So there is one more thing hiding in there. - use pci_get_device One of the testers requested if we can also put the move to pci_get_device patch with removal of dependency on PCI_LEGACY, to the stable release. The patches are for and based on Linux-2.6.24. here is the list of patches: [PATCH 1/5] gdth: update deprecated pci_find_device [PATCH 2/5] gdth: scan for scsi devices [PATCH 3/5] gdth: bugfix for the at-exit problems [PATCH 4/5] gdth: fix to internal commands execution [PATCH 5/5] gdth: remove gdth cooked up command accessors Please all test and report your findings. Thanks in advance Boaz --- gdth_info WARNING: at arch/x86/kernel/pci-dma_32.c:66 dma_free_coherent() Pid: 5501, comm: cat Not tainted 2.6.24 #43 [c0107137] dma_free_coherent+0x93/0x95 [c025ef73] gdth_ioctl_free+0x4c/0x69 [c0264a36] gdth_proc_info+0x165f/0x182c [c0111f7a] update_curr+0xeb/0xf2 [c01132aa] task_rq_lock+0x29/0x50 [c0113706] try_to_wake_up+0x42/0x342 [c0113706] try_to_wake_up+0x42/0x342 [c0111a9f] __wake_up_common+0x46/0x6d [c0113569] __wake_up+0x32/0x42 [c022dad9] n_tty_receive_buf+0x2e8/0xe97 [c022dad9] n_tty_receive_buf+0x2e8/0xe97 [c0111f0a] update_curr+0x7b/0xf2 [c0112625] enqueue_task_fair+0x27/0x30 [c0111783] enqueue_task+0xa/0x14 [c025e351] proc_scsi_read+0x29/0x3d [c025e328] proc_scsi_read+0x0/0x3d [c0189704] proc_file_read+0x1c6/0x279 [c018953e] proc_file_read+0x0/0x279 [c0185eca] proc_reg_read+0x53/0x71 [c0185e77] proc_reg_read+0x0/0x71 [c0159968] vfs_read+0x85/0x11b [c0159d9d] sys_read+0x41/0x6a [c0102822] sysenter_past_esp+0x5f/0x85 /gdth_info - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] gdth: fix to internal commands execution
The recent patch named: [SCSI] gdth: !use_sg cleanup and use of scsi accessors has done a bad job in handling internal commands issued by gdth_execute(). Internal commands are issued with device gdth_cmd_str ready made directly to the card, without any mapping or translations of scsi commands. So here I added a gdth_cmd_str pointer to the gdth_cmndinfo private structure which is then copied directly to host. following this patch is a cleanup that removes the home cooked accessors and reverts them to regular scsi_cmnd accessors. Since they are not used anymore. After review maybe the 2 patches should be squashed together. FIXME: There is still a problem with gdth_get_info(). as reported there is a WARN_ON trigerd in dma_free_coherent() when doing: $ cat /proc/sys/gdth/0 Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/gdth.c | 30 -- drivers/scsi/gdth.h |1 + 2 files changed, 13 insertions(+), 18 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index 0979553..077f972 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -160,7 +160,7 @@ static void gdth_readapp_event(gdth_ha_str *ha, unchar application, static void gdth_clear_events(void); static void gdth_copy_internal_data(gdth_ha_str *ha, Scsi_Cmnd *scp, -char *buffer, ushort count, int to_buffer); +char *buffer, ushort count); static int gdth_internal_cache_cmd(gdth_ha_str *ha, Scsi_Cmnd *scp); static int gdth_fill_cache_cmd(gdth_ha_str *ha, Scsi_Cmnd *scp, ushort hdrive); @@ -439,8 +439,8 @@ static struct gdth_cmndinfo *gdth_get_cmndinfo(gdth_ha_str *ha) for (i=0; iGDTH_MAXCMDS; ++i) { if (ha-cmndinfo[i].index == 0) { priv = ha-cmndinfo[i]; - priv-index = i+1; memset(priv, 0, sizeof(*priv)); + priv-index = i+1; break; } } @@ -487,7 +487,6 @@ int __gdth_execute(struct scsi_device *sdev, gdth_cmd_str *gdtcmd, char *cmnd, gdth_ha_str *ha = shost_priv(sdev-host); Scsi_Cmnd *scp; struct gdth_cmndinfo cmndinfo; -struct scatterlist one_sg; DECLARE_COMPLETION_ONSTACK(wait); int rval; @@ -501,13 +500,10 @@ int __gdth_execute(struct scsi_device *sdev, gdth_cmd_str *gdtcmd, char *cmnd, /* use request field to save the ptr. to completion struct. */ scp-request = (struct request *)wait; scp-timeout_per_command = timeout*HZ; -sg_init_one(one_sg, gdtcmd, sizeof(*gdtcmd)); -gdth_set_sglist(scp, one_sg); -gdth_set_sg_count(scp, 1); -gdth_set_bufflen(scp, sizeof(*gdtcmd)); scp-cmd_len = 12; memcpy(scp-cmnd, cmnd, 12); cmndinfo.priority = IOCTL_PRI; +cmndinfo.internal_cmd_str = gdtcmd; cmndinfo.internal_command = 1; TRACE((__gdth_execute() cmd 0x%x\n, scp-cmnd[0])); @@ -2351,7 +2347,7 @@ static void gdth_next(gdth_ha_str *ha) * buffers, kmap_atomic() as needed. */ static void gdth_copy_internal_data(gdth_ha_str *ha, Scsi_Cmnd *scp, -char *buffer, ushort count, int to_buffer) +char *buffer, ushort count) { ushort cpcount,i, max_sg = gdth_sg_count(scp); ushort cpsum,cpnow; @@ -2377,10 +2373,7 @@ static void gdth_copy_internal_data(gdth_ha_str *ha, Scsi_Cmnd *scp, } local_irq_save(flags); address = kmap_atomic(sg_page(sl), KM_BIO_SRC_IRQ) + sl-offset; -if (to_buffer) -memcpy(buffer, address, cpnow); -else -memcpy(address, buffer, cpnow); +memcpy(address, buffer, cpnow); flush_dcache_page(sg_page(sl)); kunmap_atomic(address, KM_BIO_SRC_IRQ); local_irq_restore(flags); @@ -2434,7 +2427,7 @@ static int gdth_internal_cache_cmd(gdth_ha_str *ha, Scsi_Cmnd *scp) strcpy(inq.vendor,ha-oem_name); sprintf(inq.product,Host Drive #%02d,t); strcpy(inq.revision, ); -gdth_copy_internal_data(ha, scp, (char*)inq, sizeof(gdth_inq_data), 0); +gdth_copy_internal_data(ha, scp, (char*)inq, sizeof(gdth_inq_data)); break; case REQUEST_SENSE: @@ -2444,7 +2437,7 @@ static int gdth_internal_cache_cmd(gdth_ha_str *ha, Scsi_Cmnd *scp) sd.key = NO_SENSE; sd.info = 0; sd.add_length= 0; -gdth_copy_internal_data(ha, scp, (char*)sd, sizeof(gdth_sense_data), 0); +gdth_copy_internal_data(ha, scp, (char*)sd, sizeof(gdth_sense_data)); break; case MODE_SENSE: @@ -2456,7 +2449,7 @@ static int gdth_internal_cache_cmd(gdth_ha_str *ha, Scsi_Cmnd *scp) mpd.bd.block_length[0] = (SECTOR_SIZE 0x00ff) 16; mpd.bd.block_length[1] = (SECTOR_SIZE 0xff00) 8; mpd.bd.block_length[2] = (SECTOR_SIZE
[PATCH 3/5] gdth: bugfix for the at-exit problems
This is a bugfix for the 2.6.24.x stable releases. gdth_exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. So del_timer_sync the timer before we delete the cards. also the reboot notifier function would crash. So clean that up and fix the crashes. Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/gdth.c | 82 +- 1 files changed, 28 insertions(+), 54 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index 4959037..0979553 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep, unsigned int cmd, unsigned long arg); static void gdth_flush(gdth_ha_str *ha); -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf); static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *)); static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp, struct gdth_cmndinfo *cmndinfo); @@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd, #include gdth_proc.h #include gdth_proc.c -/* notifier block to get a notify on system shutdown/halt/reboot */ -static struct notifier_block gdth_notifier = { -gdth_halt, NULL, 0 -}; -static int notifier_disabled = 0; - static gdth_ha_str *gdth_find_ha(int hanum) { gdth_ha_str *ha; @@ -3796,6 +3789,8 @@ static void gdth_timeout(ulong data) gdth_ha_str *ha; ulong flags; +BUG_ON(list_empty(gdth_instances)); + ha = list_first_entry(gdth_instances, gdth_ha_str, list); spin_lock_irqsave(ha-smp_lock, flags); @@ -4671,45 +4666,6 @@ static void gdth_flush(gdth_ha_str *ha) } } -/* shutdown routine */ -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf) -{ -gdth_ha_str *ha; -#ifndef __alpha__ -gdth_cmd_strgdtcmd; -charcmnd[MAX_COMMAND_SIZE]; -#endif - -if (notifier_disabled) -return NOTIFY_OK; - -TRACE2((gdth_halt() event %d\n,(int)event)); -if (event != SYS_RESTART event != SYS_HALT event != SYS_POWER_OFF) -return NOTIFY_DONE; - -notifier_disabled = 1; -printk(GDT-HA: Flushing all host drives .. ); -list_for_each_entry(ha, gdth_instances, list) { -gdth_flush(ha); - -#ifndef __alpha__ -/* controller reset */ -memset(cmnd, 0xff, MAX_COMMAND_SIZE); -gdtcmd.BoardNode = LOCALBOARD; -gdtcmd.Service = CACHESERVICE; -gdtcmd.OpCode = GDT_RESET; -TRACE2((gdth_halt(): reset controller %d\n, ha-hanum)); -gdth_execute(ha-shost, gdtcmd, cmnd, 10, NULL); -#endif -} -printk(Done.\n); - -#ifdef GDTH_STATISTICS -del_timer(gdth_timer); -#endif -return NOTIFY_OK; -} - /* configure lun */ static int gdth_slave_configure(struct scsi_device *sdev) { @@ -5144,13 +5100,13 @@ static void gdth_remove_one(gdth_ha_str *ha) scsi_remove_host(shp); + gdth_flush(ha); + if (ha-sdev) { scsi_free_host_dev(ha-sdev); ha-sdev = NULL; } - gdth_flush(ha); - if (shp-irq) free_irq(shp-irq,ha); @@ -5176,6 +5132,24 @@ static void gdth_remove_one(gdth_ha_str *ha) scsi_host_put(shp); } +static int gdth_halt(struct notifier_block *nb, ulong event, void *buf) +{ + gdth_ha_str *ha; + + TRACE2((gdth_halt() event %d\n, (int)event)); + if (event != SYS_RESTART event != SYS_HALT event != SYS_POWER_OFF) + return NOTIFY_DONE; + + list_for_each_entry(ha, gdth_instances, list) + gdth_flush(ha); + + return NOTIFY_OK; +} + +static struct notifier_block gdth_notifier = { +gdth_halt, NULL, 0 +}; + static int __init gdth_init(void) { if (disable) { @@ -5238,7 +5212,6 @@ static int __init gdth_init(void) add_timer(gdth_timer); #endif major = register_chrdev(0,gdth, gdth_fops); - notifier_disabled = 0; register_reboot_notifier(gdth_notifier); gdth_polling = FALSE; return 0; @@ -5248,14 +5221,15 @@ static void __exit gdth_exit(void) { gdth_ha_str *ha; - list_for_each_entry(ha, gdth_instances, list) - gdth_remove_one(ha); + unregister_chrdev(major, gdth); + unregister_reboot_notifier(gdth_notifier); #ifdef GDTH_STATISTICS - del_timer(gdth_timer); + del_timer_sync(gdth_timer); #endif - unregister_chrdev(major,gdth); - unregister_reboot_notifier(gdth_notifier); + + list_for_each_entry(ha, gdth_instances, list) + gdth_remove_one(ha); } module_init(gdth_init); -- 1.5.3.3 - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http
Re: [PATCH] 2.6.25-rc1-git2: GDT SCSI: change drivers/scsi/gdth.c into using pci_get device
On Wed, Feb 13 2008 at 2:17 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-02-12 at 20:48 -0300, Sergio Luis wrote: reposting an updated version of it. Please check if it's ok. Looks fine, thanks! You added an extra space at the end of while ((pdev = pci_get_device(vendor, device, pdev)) Which I fixed. Unfortunately checkpatch isn't very helpful for this driver since it uses spaces not tabs everywhere, which checkpatch really hates. James James hi This patch is now obsolete. After Jeff's patch to convert it to hot plug API. Jeffs patch looks very good to me. I will try to push it through the testers. I still don't have a card for testing myself. Again anyone wants to send me a card. Intel people anybody home? I will add the missing Kconfig hunk to jeff's patch and will push it after it is tested. Thanks Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
On Wed, Feb 13 2008 at 9:06 +0200, Stefan Priebe - allied internet ag [EMAIL PROTECTED] wrote: Hello! I've tested this patch now - and it works fine. Now rmmod, halt and reboot also works. Stefan Priebe This is grate news Stefan. Thank you very much for all your time and effort, with out we could not have fixed all this. Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[Fwd: tested your patch to gdth]
I have received below message from a tester. Can any one please help me with the /proc/scsi/gdth/0 problem. Where should I look? Boaz On Wed, Feb 13 2008 at 8:39 +0200, Tobias Oetiker [EMAIL PROTECTED] wrote: Hi Boaz, I just tested the patch to GDTH you published yesterday on LKML. With it I can boot of GDTH and things seem to work. Except when I do a cat to /proc/scsi/gdth/0 then the kernel panics and the machine reboots. cheers tobi - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Subject: [PATCH 1/3] Let scsi_cmnd-cmnd use request-cmd buffer
On Tue, Feb 12 2008 at 21:41 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Sun, 2008-02-10 at 21:05 +0200, Boaz Harrosh wrote: - struct scsi_cmnd had a 16 bytes command buffer of its own. This is an unnecessary duplication and copy of request's cmd. It is probably left overs from the time that scsi_cmnd could function without a request attached. So clean that up. - Once above is done, few places, apart from scsi-ml, needed adjustments due to changing the data type of scsi_cmnd-cmnd. - Lots of drivers still use MAX_COMMAND_SIZE. So I have left that #define but equate it to BLK_MAX_CDB. The way I see it and is reflected in the patch below is. MAX_COMMAND_SIZE - means: The longest fixed-length (*) SCSI CDB as per the SCSI standard and is not related to the implementation. BLK_MAX_CDB. - The allocated space at the request level (*)fixed-length here means commands that their size can be determined by their opcode and the CDB does not carry a length specifier, like the VARIABLE_LENGTH_CMD(0x7f) command. This is actually not exactly true and the SCSI standard also defines extended commands and vendor specific commands that can be bigger than 16 bytes. The kernel will support these using the same infrastructure used for VARLEN CDB's. So in effect MAX_COMMAND_SIZE means the maximum size command scsi-ml supports without specifying a cmd_len by ULD's When we do this, what happens to the minority of drivers that need the command in DMAable memory ... or have you audited them all and we can now dump the DMA pool allocation for SCSI commands? James Am I right in assuming that I only need to audited the drivers that have .unchecked_isa_dma set? I will redo this audit again, and report back. Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
On Tue, Feb 12 2008 at 19:40 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: gdth _exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. del_timer_sync the timer before we delete the cards. NOT YET TESTED Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] Tested-by: Stefan Priebe [EMAIL PROTECTED] --- drivers/scsi/gdth.c | 15 --- 1 files changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index 8eb78be..103280e 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -3793,6 +3793,8 @@ static void gdth_timeout(ulong data) gdth_ha_str *ha; ulong flags; +BUG_ON(list_empty(gdth_instances)); + ha = list_first_entry(gdth_instances, gdth_ha_str, list); spin_lock_irqsave(ha-smp_lock, flags); @@ -5146,8 +5148,6 @@ static void gdth_remove_one(gdth_ha_str *ha) ha-sdev = NULL; } - gdth_flush(ha); - if (shp-irq) free_irq(shp-irq,ha); @@ -5245,14 +5245,15 @@ static void __exit gdth_exit(void) { gdth_ha_str *ha; - list_for_each_entry(ha, gdth_instances, list) - gdth_remove_one(ha); + unregister_chrdev(major,gdth); + unregister_reboot_notifier(gdth_notifier); #ifdef GDTH_STATISTICS - del_timer(gdth_timer); + del_timer_sync(gdth_timer); #endif - unregister_chrdev(major,gdth); - unregister_reboot_notifier(gdth_notifier); + + list_for_each_entry(ha, gdth_instances, list) + gdth_remove_one(ha); } module_init(gdth_init); James please put this patch in rc-fixes also. It has now been tested by few people, and it solves a reproducible problem in the unloading of the driver. It was not yet confirmed by Andrew's reporter with the: +if (list_empty(gdth_instances)) + return; at gdth_timer() In -mm tree. In my patch I have converted the if() to a BUG_ON because now it should not happen. But I figure it is not worse then what there is now, which is nothing. With your recommendation I will push both patches to the stable branches People have emailed me requesting it. Thanks Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] gdth: convert to PCI hotplug API
On Wed, Feb 13 2008 at 1:49 +0200, Jeff Garzik [EMAIL PROTECTED] wrote: Signed-off-by: Jeff Garzik [EMAIL PROTECTED] --- drivers/scsi/gdth.c | 143 +++- 1 file changed, 86 insertions(+), 57 deletions(-) snip below is the same exact patch rebased on my last two bugfixes. (Already in scsi-rc-fixes) do you intend this to be pushed into 2.6.25-rcx or this is already for 2.6.26? Should we put this in -mm tree for testing? Boaz THIS IS NOT YET TESTED --- From: Jeff Garzik [EMAIL PROTECTED] Date: Wed, 13 Feb 2008 13:01:16 +0200 Subject: [PATCH] gdth: convert to PCI hotplug API Signed-off-by: Jeff Garzik [EMAIL PROTECTED] Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/gdth.c | 143 ++ 1 files changed, 86 insertions(+), 57 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index 103280e..107d2c7 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -596,85 +596,107 @@ static int __init gdth_search_isa(ulong32 bios_adr) #endif /* CONFIG_ISA */ #ifdef CONFIG_PCI -static void gdth_search_dev(gdth_pci_str *pcistr, ushort *cnt, -ushort vendor, ushort dev); +static gdth_pci_str gdth_pcistr[MAXHA]; +static int gdth_pci_cnt; +static bool gdth_pci_registered; -static int __init gdth_search_pci(gdth_pci_str *pcistr) +static bool __init gdth_search_vortex(ushort device) { -ushort device, cnt; - -TRACE((gdth_search_pci()\n)); - -cnt = 0; -for (device = 0; device = PCI_DEVICE_ID_VORTEX_GDT6555; ++device) -gdth_search_dev(pcistr, cnt, PCI_VENDOR_ID_VORTEX, device); -for (device = PCI_DEVICE_ID_VORTEX_GDT6x17RP; - device = PCI_DEVICE_ID_VORTEX_GDTMAXRP; ++device) -gdth_search_dev(pcistr, cnt, PCI_VENDOR_ID_VORTEX, device); -gdth_search_dev(pcistr, cnt, PCI_VENDOR_ID_VORTEX, -PCI_DEVICE_ID_VORTEX_GDTNEWRX); -gdth_search_dev(pcistr, cnt, PCI_VENDOR_ID_VORTEX, -PCI_DEVICE_ID_VORTEX_GDTNEWRX2); -gdth_search_dev(pcistr, cnt, PCI_VENDOR_ID_INTEL, -PCI_DEVICE_ID_INTEL_SRC); -gdth_search_dev(pcistr, cnt, PCI_VENDOR_ID_INTEL, -PCI_DEVICE_ID_INTEL_SRC_XSCALE); -return cnt; + if (device = PCI_DEVICE_ID_VORTEX_GDT6555) + return true; + if (device = PCI_DEVICE_ID_VORTEX_GDT6x17RP + device = PCI_DEVICE_ID_VORTEX_GDTMAXRP) + return true; + if (device == PCI_DEVICE_ID_VORTEX_GDTNEWRX || + device == PCI_DEVICE_ID_VORTEX_GDTNEWRX2) + return true; + return false; } +static int gdth_pci_init_one(struct pci_dev *pdev, +const struct pci_device_id *ent); +static void gdth_pci_remove_one(struct pci_dev *pdev); +static void gdth_remove_one(gdth_ha_str *ha); + /* Vortex only makes RAID controllers. * We do not really want to specify all 550 ids here, so wildcard match. */ -static struct pci_device_id gdthtable[] __maybe_unused = { -{PCI_VENDOR_ID_VORTEX,PCI_ANY_ID,PCI_ANY_ID, PCI_ANY_ID}, -{PCI_VENDOR_ID_INTEL,PCI_DEVICE_ID_INTEL_SRC,PCI_ANY_ID,PCI_ANY_ID}, - {PCI_VENDOR_ID_INTEL,PCI_DEVICE_ID_INTEL_SRC_XSCALE,PCI_ANY_ID,PCI_ANY_ID}, -{0} +static struct pci_device_id gdthtable[] __devinitdata = { + { PCI_VDEVICE(VORTEX, PCI_ANY_ID) }, + { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_SRC) }, + { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_SRC_XSCALE) }, + { } /* terminate list */ +}; +MODULE_DEVICE_TABLE(pci, gdthtable); + +static struct pci_driver gdth_pci_driver = { + .name = gdth, + .id_table = gdthtable, + .probe = gdth_pci_init_one, + .remove = gdth_pci_remove_one, }; -MODULE_DEVICE_TABLE(pci,gdthtable); -static void __init gdth_search_dev(gdth_pci_str *pcistr, ushort *cnt, - ushort vendor, ushort device) +static void gdth_pci_remove_one(struct pci_dev *pdev) +{ + gdth_ha_str *ha = pci_get_drvdata(pdev); + + pci_set_drvdata(pdev, NULL); + + list_del(ha-list); + gdth_remove_one(ha); + + pci_disable_device(pdev); +} + +static int __devinit gdth_pci_init_one(struct pci_dev *pdev, + const struct pci_device_id *ent) { -ulong base0, base1, base2; -struct pci_dev *pdev; + ushort vendor = pdev-vendor; + ushort device = pdev-device; + ulong base0, base1, base2; + int rc; -TRACE((gdth_search_dev() cnt %d vendor %x device %x\n, - *cnt, vendor, device)); + TRACE((gdth_search_dev() cnt %d vendor %x device %x\n, + gdth_pci_cnt, vendor, device)); + + if (vendor == PCI_VENDOR_ID_VORTEX !gdth_search_vortex(device)) + return -ENODEV; + + rc = pci_enable_device(pdev); + if (rc) + return rc
Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: -gdth_flush(ha); - This piece doesn't look right. gdth_flush() forces the internal cache to disk backing. If you remove it, you're taking the chance that the machine will be powered off without a writeback which can cause data corruption. James Yes. I have more problems reported, with exit, and am just sending one more patch that puts this back in. Which was tested. So I will resend this one plus one new one. Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: - gdth_flush(ha); - This piece doesn't look right. gdth_flush() forces the internal cache to disk backing. If you remove it, you're taking the chance that the machine will be powered off without a writeback which can cause data corruption. James Yes. I have more problems reported, with exit, and am just sending one more patch that puts this back in. Which was tested. So I will resend this one plus one new one. Boaz The gdth driver would do a register_reboot_notifier(gdth_notifier); to a gdth_halt() function, which would then redo half of what gdth_exit does, and wrongly so, and crash. Are we guaranteed in todays kernel that modules .exit function be called on an halt or reboot? If so then there is no need for duplications and the gdth_halt() should go. Submitted a patch that replaces the previous one I submitted with a deeper fix. [PATCH] gdth: bugfix for the at-exit problems If you ask me this all gdth_flush() is a crackup. sd and scsi-ml are doing scsi FLUSH commands when ever is needed. The controller as no business caching data in memory longer then what is stated in standard. Raid controller or no raid controller. Virtual or not virtual device. Data on Plate means data on plate. What if there is a power outage? what the driver can do then? Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html