Re: [PATCHv2] ib_srp: Remove WARN_ON in srp_terminate_io()
On Wed, 2018-10-17 at 11:30 -0400, Doug Ledford wrote: > On Wed, 2018-10-17 at 07:43 -0700, Bart Van Assche wrote: > > On 10/17/18 12:20 AM, Hannes Reinecke wrote: > > > The WARN_ON() is pointless as the rport is placed in > > > SDEV_TRANSPORT_OFFLINE > > > at that time, so no new commands can be submitted via srp_queuecommand() > > > > > > Signed-off-by: Hannes Reinecke > > > Reviewed-by: Jens Axboe > > > Reviewed-by: Johannes Thumshirn > > > --- > > > drivers/infiniband/ulp/srp/ib_srp.c | 7 --- > > > 1 file changed, 7 deletions(-) > > > > > > diff --git a/drivers/infiniband/ulp/srp/ib_srp.c > > > b/drivers/infiniband/ulp/srp/ib_srp.c > > > index 0b34e909505f..5a79444c2f3c 100644 > > > --- a/drivers/infiniband/ulp/srp/ib_srp.c > > > +++ b/drivers/infiniband/ulp/srp/ib_srp.c > > > @@ -1334,13 +1334,6 @@ static void srp_terminate_io(struct srp_rport > > > *rport) > > > struct scsi_device *sdev; > > > int i, j; > > > > > > - /* > > > - * Invoking srp_terminate_io() while srp_queuecommand() is running > > > - * is not safe. Hence the warning statement below. > > > - */ > > > - shost_for_each_device(sdev, shost) > > > - WARN_ON_ONCE(sdev->request_queue->request_fn_active); > > > - > > > for (i = 0; i < target->ch_count; i++) { > > > ch = >ch[i]; > > > > Although I had explained before why I think that warning is not > > pointless, I agree with this change because the legacy block layer is > > going away. Anyway: > > > > Acked-by: Bart Van Assche > > Thanks, applied to for-next. > FWIW, this introduced a build warning (shost and sdev are now unused variables). I edited the patch to remove the newly unneeded variables. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCHv2] ib_srp: Remove WARN_ON in srp_terminate_io()
On Wed, 2018-10-17 at 07:43 -0700, Bart Van Assche wrote: > On 10/17/18 12:20 AM, Hannes Reinecke wrote: > > The WARN_ON() is pointless as the rport is placed in SDEV_TRANSPORT_OFFLINE > > at that time, so no new commands can be submitted via srp_queuecommand() > > > > Signed-off-by: Hannes Reinecke > > Reviewed-by: Jens Axboe > > Reviewed-by: Johannes Thumshirn > > --- > > drivers/infiniband/ulp/srp/ib_srp.c | 7 --- > > 1 file changed, 7 deletions(-) > > > > diff --git a/drivers/infiniband/ulp/srp/ib_srp.c > > b/drivers/infiniband/ulp/srp/ib_srp.c > > index 0b34e909505f..5a79444c2f3c 100644 > > --- a/drivers/infiniband/ulp/srp/ib_srp.c > > +++ b/drivers/infiniband/ulp/srp/ib_srp.c > > @@ -1334,13 +1334,6 @@ static void srp_terminate_io(struct srp_rport *rport) > > struct scsi_device *sdev; > > int i, j; > > > > - /* > > -* Invoking srp_terminate_io() while srp_queuecommand() is running > > -* is not safe. Hence the warning statement below. > > -*/ > > - shost_for_each_device(sdev, shost) > > - WARN_ON_ONCE(sdev->request_queue->request_fn_active); > > - > > for (i = 0; i < target->ch_count; i++) { > > ch = >ch[i]; > > Although I had explained before why I think that warning is not > pointless, I agree with this change because the legacy block layer is > going away. Anyway: > > Acked-by: Bart Van Assche Thanks, applied to for-next. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH v2 12/15] RDMA/cma: make config_item_type const
On Mon, 2017-10-16 at 17:18 +0200, Bhumika Goyal wrote: > Make these structures const as they are either passed to the > functions > having the argument as const or stored as a reference in the > "ci_type" > const field of a config_item structure. > > Signed-off-by: Bhumika Goyal <bhumi...@gmail.com> Acked-by: Doug Ledford <dledf...@redhat.com> -- Doug Ledford <dledf...@redhat.com> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Re: [PATCH v10 00/15] Replace PCI pool by DMA pool API
On Thu, 2017-07-06 at 10:12 +0200, Romain Perier wrote: > The current PCI pool API are simple macro functions direct expanded > to > the appropriate dma pool functions. The prototypes are almost the > same > and semantically, they are very similar. I propose to use the DMA > pool > API directly and get rid of the old API. > > This set of patches, replaces the old API by the dma pool API > and remove the defines. Is someone planning on merging this series? If not, I'll send through the patches I've personally tested (3, 5, and 6). > Changes in v10: > - Rebased series onto next-20170706 > - I have fixed and improved patch "scsi: megaraid: Replace PCI pool > old API" > > Changes in v9: > - Rebased series onto next-20170522 > - I have fixed and improved the patch for lpfc driver > > Changes in v8: > - Rebased series onto next-20170428 > > Changes in v7: > - Rebased series onto next-20170416 > - Added Acked-by, Tested-by and Reviwed-by tags > > Changes in v6: > - Fixed an issue reported by kbuild test robot about changes in > DAC960 > - Removed patches 15/19,16/19,17/19,18/19. They have been merged by > Greg > - Added Acked-by Tags > > Changes in v5: > - Re-worded the cover letter (remove sentence about checkpatch.pl) > - Rebased series onto next-20170308 > - Fix typos in commit message > - Added Acked-by Tags > > Changes in v4: > - Rebased series onto next-20170301 > - Removed patch 20/20: checks done by checkpath.pl, no longer > required. > Thanks to Peter and Joe for their feedbacks. > - Added Reviewed-by tags > > Changes in v3: > - Rebased series onto next-20170224 > - Fix checkpath.pl reports for patch 11/20 and patch 12/20 > - Remove prefix RFC > Changes in v2: > - Introduced patch 18/20 > - Fixed cosmetic changes: spaces before brace, live over 80 > characters > - Removed some of the check for NULL pointers before calling > dma_pool_destroy > - Improved the regexp in checkpatch for pci_pool, thanks to Joe > Perches > - Added Tested-by and Acked-by tags > > Romain Perier (15): > block: DAC960: Replace PCI pool old API > dmaengine: pch_dma: Replace PCI pool old API > IB/mthca: Replace PCI pool old API > net: e100: Replace PCI pool old API > mlx4: Replace PCI pool old API > mlx5: Replace PCI pool old API > wireless: ipw2200: Replace PCI pool old API > scsi: be2iscsi: Replace PCI pool old API > scsi: csiostor: Replace PCI pool old API > scsi: lpfc: Replace PCI pool old API > scsi: megaraid: Replace PCI pool old API > scsi: mpt3sas: Replace PCI pool old API > scsi: mvsas: Replace PCI pool old API > scsi: pmcraid: Replace PCI pool old API > PCI: Remove PCI pool macro functions > > drivers/block/DAC960.c| 38 ++- > drivers/block/DAC960.h| 4 +- > drivers/dma/pch_dma.c | 12 ++-- > drivers/infiniband/hw/mthca/mthca_av.c| 10 +-- > drivers/infiniband/hw/mthca/mthca_cmd.c | 8 +-- > drivers/infiniband/hw/mthca/mthca_dev.h | 4 +- > drivers/net/ethernet/intel/e100.c | 12 ++-- > drivers/net/ethernet/mellanox/mlx4/cmd.c | 10 +-- > drivers/net/ethernet/mellanox/mlx4/mlx4.h | 2 +- > drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 11 ++-- > drivers/net/wireless/intel/ipw2x00/ipw2200.c | 13 ++-- > drivers/scsi/be2iscsi/be_iscsi.c | 6 +- > drivers/scsi/be2iscsi/be_main.c | 6 +- > drivers/scsi/be2iscsi/be_main.h | 2 +- > drivers/scsi/csiostor/csio_hw.h | 2 +- > drivers/scsi/csiostor/csio_init.c | 11 ++-- > drivers/scsi/csiostor/csio_scsi.c | 6 +- > drivers/scsi/lpfc/lpfc.h | 16 ++--- > drivers/scsi/lpfc/lpfc_init.c | 16 ++--- > drivers/scsi/lpfc/lpfc_mem.c | 90 +-- > > drivers/scsi/lpfc/lpfc_nvme.c | 6 +- > drivers/scsi/lpfc/lpfc_nvmet.c| 4 +- > drivers/scsi/lpfc/lpfc_scsi.c | 12 ++-- > drivers/scsi/lpfc/lpfc_sli.c | 6 +- > drivers/scsi/megaraid/megaraid_mbox.c | 30 - > drivers/scsi/megaraid/megaraid_mm.c | 29 - > drivers/scsi/megaraid/megaraid_sas_base.c | 27 > drivers/scsi/megaraid/megaraid_sas_fusion.c | 46 +++--- > drivers/scsi/mpt3sas/mpt3sas_base.c | 73 ++--- > - > drivers/scsi/mvsas/mv_init.c | 6 +- > drivers/scsi/mvsas/mv_sas.c | 6 +- > drivers/scsi/pmcraid.c| 10 +-- > drivers/scsi/pmcraid.h| 2 +- > include/linux/mlx5/driver.h | 2 +- > include/linux/pci.h | 9 --- > 35 files changed, 269 insertions(+), 278 deletions(-) > -- Doug Ledford <dledf...@redhat.com> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Re: [PATCH v6 05/15] mlx4: Replace PCI pool old API
On Mon, 2017-03-20 at 08:32 +0200, Leon Romanovsky wrote: > On Sun, Mar 19, 2017 at 06:03:54PM +0100, Romain Perier wrote: > > > > The PCI pool API is deprecated. This commit replaces the PCI pool > > old > > API by the appropriate function with the DMA pool API. > > > > Signed-off-by: Romain Perier <romain.per...@collabora.com> > > Acked-by: Peter Senna Tschudin <peter.se...@collabora.com> > > Tested-by: Peter Senna Tschudin <peter.se...@collabora.com> > > --- > > drivers/net/ethernet/mellanox/mlx4/cmd.c | 10 +- > > drivers/net/ethernet/mellanox/mlx4/mlx4.h | 2 +- > > 2 files changed, 6 insertions(+), 6 deletions(-) > > > > Thanks, > Reviewed-by: Leon Romanovsky <leo...@mellanox.com> Changes look fine to me, and it has been tested on this specific hardware and confirmed RDMA communications still work. Acked-by: Doug Ledford <dledf...@redhat.com> Tested-by: Doug Ledford <dledf...@redhat.com> -- Doug Ledford <dledf...@redhat.com> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Re: [PATCH v6 03/15] IB/mthca: Replace PCI pool old API
On Sun, 2017-03-19 at 18:03 +0100, Romain Perier wrote: > The PCI pool API is deprecated. This commit replaces the PCI pool old > API by the appropriate function with the DMA pool API. > > Signed-off-by: Romain Perier <romain.per...@collabora.com> > Acked-by: Peter Senna Tschudin <peter.se...@collabora.com> > Tested-by: Peter Senna Tschudin <peter.se...@collabora.com> > Changes look ok, and I've tested them here as well. Acked-by: Doug Ledford <dledf...@redhat.com> Tested-by: Doug Ledford <dledf...@redhat.com> -- Doug Ledford <dledf...@redhat.com> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Re: [PATCH v6 06/15] mlx5: Replace PCI pool old API
On Mon, 2017-03-20 at 08:31 +0200, Leon Romanovsky wrote: > On Sun, Mar 19, 2017 at 06:03:55PM +0100, Romain Perier wrote: > > > > The PCI pool API is deprecated. This commit replaces the PCI pool > > old > > API by the appropriate function with the DMA pool API. > > > > Signed-off-by: Romain Perier <romain.per...@collabora.com> > > Reviewed-by: Peter Senna Tschudin <peter.se...@collabora.com> > > --- > > drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 11 ++- > > include/linux/mlx5/driver.h | 2 +- > > 2 files changed, 7 insertions(+), 6 deletions(-) > > > > Thanks, > Acked-by: Leon Romanovsky <leo...@mellanox.com> Changes look fine to me, and in addition I've compiled and tested them on mlx5 hardware and verified that RDMA communications still work. Acked-by: Doug Ledford <dledf...@redhat.com> Tested-by: Doug Ledford <dledf...@redhat.com> -- Doug Ledford <dledf...@redhat.com> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Re: iscsi_trx going into D state
00 >>>> [] iscsi_target_tx_thread+0x1aa/0x1d0 >>>> [] kthread+0xd8/0xf0 >>>> [] ret_from_fork+0x3f/0x70 >>>> [] 0x >>>> # cat /proc/23018/stack >>>> [] target_wait_for_sess_cmds+0x49/0x1a0 >>>> [] isert_wait_conn+0x1ab/0x2f0 [ib_isert] >>>> [] iscsit_close_connection+0x162/0x870 >>>> [] iscsit_take_action_for_connection_exit+0x7f/0x100 >>>> [] iscsi_target_tx_thread+0x1aa/0x1d0 >>>> [] kthread+0xd8/0xf0 >>>> [] ret_from_fork+0x3f/0x70 >>>> [] 0x >>>> >>>> From dmesg: >>>> [ 394.476332] INFO: rcu_sched self-detected stall on CPU >>>> [ 394.476334] 20-...: (23976 ticks this GP) >>>> idle=edd/141/0 softirq=292/292 fqs=18788 >>>> [ 394.476336] (t=24003 jiffies g=3146 c=3145 q=0) >>>> [ 394.476337] Task dump for CPU 20: >>>> [ 394.476338] kworker/u68:2 R running task0 12906 2 >>>> 0x0008 >>>> [ 394.476345] Workqueue: isert_comp_wq isert_cq_work [ib_isert] >>>> [ 394.476346] 883f2fe38000 f805705e 883f7fd03da8 >>>> 810ac8ff >>>> [ 394.476347] 0014 81adb680 883f7fd03dc0 >>>> 810af239 >>>> [ 394.476348] 0015 883f7fd03df0 810e1cd0 >>>> 883f7fd17b80 >>>> [ 394.476348] Call Trace: >>>> [ 394.476354][] sched_show_task+0xaf/0x110 >>>> [ 394.476355] [] dump_cpu_task+0x39/0x40 >>>> [ 394.476357] [] rcu_dump_cpu_stacks+0x80/0xb0 >>>> [ 394.476359] [] rcu_check_callbacks+0x540/0x820 >>>> [ 394.476360] [] ? account_system_time+0x81/0x110 >>>> [ 394.476363] [] ? tick_sched_do_timer+0x50/0x50 >>>> [ 394.476364] [] update_process_times+0x39/0x60 >>>> [ 394.476365] [] tick_sched_handle.isra.17+0x25/0x60 >>>> [ 394.476366] [] tick_sched_timer+0x3d/0x70 >>>> [ 394.476368] [] __hrtimer_run_queues+0x102/0x290 >>>> [ 394.476369] [] hrtimer_interrupt+0xa8/0x1a0 >>>> [ 394.476372] [] local_apic_timer_interrupt+0x35/0x60 >>>> [ 394.476374] [] smp_apic_timer_interrupt+0x3d/0x50 >>>> [ 394.476376] [] apic_timer_interrupt+0x87/0x90 >>>> [ 394.476379][] ? console_unlock+0x41e/0x4e0 >>>> [ 394.476380] [] vprintk_emit+0x2fc/0x500 >>>> [ 394.476382] [] vprintk_default+0x1f/0x30 >>>> [ 394.476384] [] printk+0x5d/0x74 >>>> [ 394.476388] [] transport_lookup_cmd_lun+0x1d1/0x200 >>>> [ 394.476390] [] iscsit_setup_scsi_cmd+0x230/0x540 >>>> [ 394.476392] [] isert_rx_do_work+0x3f3/0x7f0 >>>> [ib_isert] >>>> [ 394.476394] [] isert_cq_work+0x184/0x770 [ib_isert] >>>> [ 394.476396] [] process_one_work+0x14f/0x400 >>>> [ 394.476397] [] worker_thread+0x114/0x470 >>>> [ 394.476398] [] ? __schedule+0x34a/0x7f0 >>>> [ 394.476399] [] ? rescuer_thread+0x310/0x310 >>>> [ 394.476400] [] kthread+0xd8/0xf0 >>>> [ 394.476402] [] ? kthread_park+0x60/0x60 >>>> [ 394.476403] [] ret_from_fork+0x3f/0x70 >>>> [ 394.476404] [] ? kthread_park+0x60/0x60 >>>> [ 405.716632] Unexpected ret: -104 send data 360 >>>> [ 405.721711] tx_data returned -32, expecting 360. >>>> >>>> Robert LeBlanc >>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 When you combine this trace with the newest one, it really makes me thing there is something of a bad interaction between the new drain cq API and the iser/isert implementation to use said API. Sagi, Christoph? -- Doug Ledford <dledf...@redhat.com> GPG Key ID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: OpenPGP digital signature
Re: [PATCH net-next 4/5] treewide: replace dev->trans_start update with helper
On 05/03/2016 10:33 AM, Florian Westphal wrote: > Replace all trans_start updates with netif_trans_update helper. > change was done via spatch: > > struct net_device *d; > @@ > - d->trans_start = jiffies > + netif_trans_update(d) > > Compile tested only. > > Cc: user-mode-linux-de...@lists.sourceforge.net > Cc: linux-xte...@linux-xtensa.org > Cc: linux1394-de...@lists.sourceforge.net > Cc: linux-r...@vger.kernel.org > Cc: net...@vger.kernel.org > Cc: mpt-fusionlinux@broadcom.com > Cc: linux-scsi@vger.kernel.org > Cc: linux-...@vger.kernel.org > Cc: linux-par...@vger.kernel.org > Cc: linux-o...@vger.kernel.org > Cc: linux-h...@vger.kernel.org > Cc: linux-...@vger.kernel.org > Cc: linux-wirel...@vger.kernel.org > Cc: linux-s...@vger.kernel.org > Cc: de...@driverdev.osuosl.org > Cc: b.a.t.m@lists.open-mesh.org > Cc: linux-blueto...@vger.kernel.org > Signed-off-by: Florian Westphal <f...@strlen.de> > --- > Checkpatch complains about whitespace damage, but > this extra whitespace already exists before this patch. > > drivers/infiniband/hw/nes/nes_nic.c| 2 +- > drivers/infiniband/ulp/ipoib/ipoib_cm.c | 2 +- > drivers/infiniband/ulp/ipoib/ipoib_ib.c| 2 +- For InfiniBand bits, Acked-by: Doug Ledford <dledf...@redhat.com> -- Doug Ledford <dledf...@redhat.com> GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
Re: [PATCH 10/13] IB/srp: use the new CQ API
On 12/13/2015 05:26 AM, Sagi Grimberg wrote: > >> Allright. How do you want to proceed? The current rdma-cq branch >> has all kinds of dependencies, but I've also prepared a new rdma-cq.2 >> branch that could go straight on top of your current queue: >> >> http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-cq.2 >> >> If you're ready to start the 4.5 tree I can send those out as a patch >> series. > > Will this get on top of iser-remote-inv? Or should I resend atop of this? I'm going through my inbox right now. I expect somewhere in there Or will make his case for why he doesn't like Christoph's patch to get rid of the attr struct. I'll listen, and if I'm not convinced, I'll take that patchset first and this one second. (I reviewed the patchset alreadyaside from the fact that I *like* having the attr struct elements in an organized sub-struct, it's fine and it definitely improves on all of those query calls). -- Doug Ledford <dledf...@redhat.com> GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
Re: [PATCH 10/13] IB/srp: use the new CQ API
On 12/11/2015 09:22 AM, Christoph Hellwig wrote: > Hi Bart, > > thanks for all the reviews. I've updated the git branch with your > suggestions and reviewed-by tags. I'm going to wait a little bit > longer for other reviews to come in before reposting the series. Indeed, thanks for all the catches Bart. This patchset, with Bart's fixups, looks good to me. -- Doug Ledford <dledf...@redhat.com> GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
Re: [PATCH 06/17] Update the infiniband uverbs driver to use idr helper functions.
On 09/16/2015 01:50 PM, Lee Duncan wrote: > Signed-off-by: Lee Duncan <ldun...@suse.com> Looks OK to me. The setting of uobj->id is no longer under the lock, but we won't succeed at an idr lookup until it is set, which means it won't be found and can't be used in idr_remove_uobj() until after the uobj->id is set regardless of the lock. Acked-by: Doug Ledford <dledf...@redhat.com> > --- > drivers/infiniband/core/uverbs_cmd.c | 12 ++-- > 1 file changed, 2 insertions(+), 10 deletions(-) > > diff --git a/drivers/infiniband/core/uverbs_cmd.c > b/drivers/infiniband/core/uverbs_cmd.c > index bbb02ffe87df..1e5b2a66a501 100644 > --- a/drivers/infiniband/core/uverbs_cmd.c > +++ b/drivers/infiniband/core/uverbs_cmd.c > @@ -120,24 +120,16 @@ static int idr_add_uobj(struct idr *idr, struct > ib_uobject *uobj) > { > int ret; > > - idr_preload(GFP_KERNEL); > - spin_lock(_uverbs_idr_lock); > - > - ret = idr_alloc(idr, uobj, 0, 0, GFP_NOWAIT); > + ret = idr_get_index(idr, _uverbs_idr_lock, uobj); > if (ret >= 0) > uobj->id = ret; > > - spin_unlock(_uverbs_idr_lock); > - idr_preload_end(); > - > return ret < 0 ? ret : 0; > } > > void idr_remove_uobj(struct idr *idr, struct ib_uobject *uobj) > { > - spin_lock(_uverbs_idr_lock); > - idr_remove(idr, uobj->id); > - spin_unlock(_uverbs_idr_lock); > + idr_put_index(idr, _uverbs_idr_lock, uobj->id); > } > > static struct ib_uobject *__idr_get_uobj(struct idr *idr, int id, > -- Doug Ledford <dledf...@redhat.com> GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
Re: [PATCH 0/3] IB/srp patches for Linux kernel v4.3
On 07/31/2015 05:12 PM, Bart Van Assche wrote: Hello Doug, Please apply the following three patches at your earliest convenience: 0001-IB-srp-Constify-a-function-argument.patch 0002-IB-srp-Handle-partial-connection-success-correctly.patch 0003-IB-srp-Bump-driver-version-and-release-date.patch Got 'em, thanks! -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
Re: [PATCH 04/12] IB/srp: Fix connection state tracking
On Tue, 2015-05-05 at 17:27 +0200, Bart Van Assche wrote: On 05/05/15 17:10, Doug Ledford wrote: On Tue, 2015-05-05 at 16:26 +0200, Bart Van Assche wrote: On 05/05/15 16:10, Doug Ledford wrote: However, while looking through the driver to research this, I noticed something else that seems more important if you ask me. With this patch we now implement individual channel connection tracking. However, in srp_queuecommand() you pick the channel based on the tag, and the blk layer has no idea of these disconnects, so the blk layer is free to assign a tag/channel to a channel that's disconnected, and then as best I can tell, you will simply try to post a work request to a channel that's already disconnected, which I would expect to fail if we have already disconnected this particular qp and not brought up a new one yet. So it seems to me there is a race condition between new incoming SCSI commands and this disconnect/reconnect window, and that maybe we should be sending these commands back to the mid layer for requeueing when the channel the blk_mq tag points to is disconnected. Or am I missing something in there? Hello Doug, Around the time a cable disconnect or other link layer failure is detected by the SRP initiator or any other SCSI LLD it is unavoidable that one or more SCSI requests fail. It is up to a higher layer (e.g. dm-multipath + multipathd) to decide what to do with such requests, e.g. queue these requests and resend these over another path. Sure, but that wasn't my point. My point was that if you know the channel is disconnected, then why don't you go immediately to the correct action in queuecommand (where correct action could be requeue waiting on reconnect or return with error, whatever is appropriate)? Instead you attempt to post a command to a known disconnected queue pair. The SRP initiator driver has been tested thoroughly with the multipath queue_if_no_path policy, with a fio job with I/O verification enabled running on top of a dm device while concurrently repeatedly simulating link layer failures (via ibportstate). Part of my questions here are because I don't know how the blk_mq handles certain conditions. However, your testing above only handles one case: all channels get dropped. As unlikely it may be, what if resource constraints caused just one channel to be dropped out of the bunch and the others stayed alive? Then the blk_mq would see requests on just one queue come back errored, while the others finished successfully. Does it drop that one queue out of rotation, or does it fail over the entire connection? Hello Doug, Sorry but I don't think that a SCSI LLD is the appropriate layer to choose between requeuing or failing a request. Be that as it may, that doesn't change what I said about posting a command to a known disconnected QP. You could just fail immediately. Something like: if (!ch-connected) { scmnd-result = DID_NO_CONNECT; goto err; } right after getting the channel in queuecommand would work. That would save a couple spinlocks, several DMA mappings, a call into the low level driver, and a few other things. (And I only left requeue on the table because I wasn't sure how the blk_mq dealt with just a single channel being down versus all of them being down) If multiple paths are available between an initiator system and a SAN and if one path fails Who says the path failed? The path may be just fine. only the multipath layer knows whether there are other working paths available. If a working path is still available then the request should be resent as soon as possible over another path. The multipath layer can only take such a decision after a SCSI LLD has failed a request. Sure. I totally get failing fast and unilaterally for multipath managed devices. That's all assuming multipath though. There are uses without that. But my point in all of this is that if you have a single qp between yourself and the target, then any error including a qp resource error == path error since you only have one path. When you have a multi queue device, that's no longer true. A transient resource problem on one qp does not mean a path event (at least not necessarily, although your statement below converts a QP event into a path event by virtue disconnecting and reconnecting all of the QPs). My curiosity is now moot given what you wrote about tearing everything down and reconnecting (unless the error handling is modified to be more subtle in its workings), but the original question in my mind was what happens at the blk_mq level if you did have a single queue drop but not all of them and you weren't using multipath. If only one channel fails all other channels are disconnected and the transport layer error handling mechanism is started. I missed that. I assume it's done in srp_start_tl_fail_timers()? -- Doug Ledford dledf...@redhat.com
Re: [GIT PULL] delete decade+ obsolete aic7xxx_old driver
On 11/08/2013 11:04 AM, Paul Gortmaker wrote: Paul Gortmaker (1): scsi: delete decade+ obsolete aic7xxx_old driver Documentation/scsi/00-INDEX | 2 - Documentation/scsi/aic7xxx_old.txt | 511 -- MAINTAINERS | 1 - drivers/scsi/Kconfig|41 - drivers/scsi/Makefile | 1 - drivers/scsi/aic7xxx_old.c | 11149 -- drivers/scsi/aic7xxx_old/aic7xxx.h |28 - drivers/scsi/aic7xxx_old/aic7xxx.reg| 1401 drivers/scsi/aic7xxx_old/aic7xxx.seq| 1539 - drivers/scsi/aic7xxx_old/aic7xxx_proc.c | 270 - drivers/scsi/aic7xxx_old/aic7xxx_reg.h | 629 -- drivers/scsi/aic7xxx_old/aic7xxx_seq.c | 817 --- drivers/scsi/aic7xxx_old/scsi_message.h |49 - drivers/scsi/aic7xxx_old/sequencer.h| 135 - 14 files changed, 16573 deletions(-) delete mode 100644 Documentation/scsi/aic7xxx_old.txt delete mode 100644 drivers/scsi/aic7xxx_old.c delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx.h delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx.reg delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx.seq delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx_proc.c delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx_reg.h delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx_seq.c delete mode 100644 drivers/scsi/aic7xxx_old/scsi_message.h delete mode 100644 drivers/scsi/aic7xxx_old/sequencer.h Farewell old friend...don't let the door hit you in the ass on the way out ;-) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi: delete decade+ obsolete aic7xxx_old driver
On 09/17/13 16:10, James Bottomley wrote: OK, so do we have any real evidence that no-one uses this driver? Does any distro actually compile it, for instance? Red Hat doesn't use it in any of their products (and it hasn't been the preferred driver since about the old Red Hat Linux 7.0 days). -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD http://people.redhat.com/dledford -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi: delete decade+ obsolete aic7xxx_old driver
Yes, this driver is well past ready to be removed. Acked-by: Doug Ledford dledf...@redhat.com Sent from my ASUS Pad Paul Gortmaker paul.gortma...@windriver.com wrote: After getting warnings in an allyesconfig build[1] from this driver, I decided to remind myself just how old it was, and whether it warranted fixing. In the Kconfig help text, I found: This driver will eventually be phased out entirely Going back to the history archive, I see the line was added[2] in Feb 2002, when we moved from v2.4.2.1 --- v2.4.2.2 So, with over a decade of notification, and multiple major releases since then, I think we can justify removing this. Currently we have people wasting time building it during routine testing, and then wasting more time re-researching the known reported warnings, only to find that nobody really is willing to integrate the fixes[3] for it. A quick search didn't seem to indicate any active user base for it. If someone happens to have a quirky _old_ card that the eleven year old new driver doesn't work with, then it is entirely reasonable that they stick with a kernel version that predates this removal. [1] drivers/scsi/aic7xxx_old.c: In function ‘aic7xxx_register’: drivers/scsi/aic7xxx_old.c:7901:5: warning: case value ‘257’ not in enumerated type ‘ahc_chip’ [-Wswitch] drivers/scsi/aic7xxx_old.c:7898:5: warning: case value ‘513’ not in enumerated type ‘ahc_chip’ [-Wswitch] drivers/scsi/aic7xxx_old.c: In function ‘aic7xxx_load_seeprom’: drivers/scsi/aic7xxx_old.c:8517:5: warning: case value ‘257’ not in enumerated type ‘ahc_chip’ [-Wswitch] drivers/scsi/aic7xxx_old.c:8510:5: warning: case value ‘513’ not in enumerated type ‘ahc_chip’ [-Wswitch] [2] http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git commit 44e8778c [3] https://lkml.org/lkml/2012/10/29/215 Cc: Hannes Reinecke h...@suse.de Cc: Doug Ledford dledf...@redhat.com Cc: James E.J. Bottomley jbottom...@parallels.com Signed-off-by: Paul Gortmaker paul.gortma...@windriver.com --- [This is an --irreversible-delete pseudo-patch which doesn't show all the file content that was deleted wholesale. The full commit is at: git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux.git aic7xxx-delete ] Documentation/scsi/00-INDEX | 2 - Documentation/scsi/aic7xxx_old.txt | 511 -- MAINTAINERS | 1 - drivers/scsi/Kconfig|41 - drivers/scsi/Makefile | 1 - drivers/scsi/aic7xxx_old.c | 11149 -- drivers/scsi/aic7xxx_old/aic7xxx.h |28 - drivers/scsi/aic7xxx_old/aic7xxx.reg| 1401 drivers/scsi/aic7xxx_old/aic7xxx.seq| 1539 - drivers/scsi/aic7xxx_old/aic7xxx_proc.c | 270 - drivers/scsi/aic7xxx_old/aic7xxx_reg.h | 629 -- drivers/scsi/aic7xxx_old/aic7xxx_seq.c | 817 --- drivers/scsi/aic7xxx_old/scsi_message.h |49 - drivers/scsi/aic7xxx_old/sequencer.h| 135 - 14 files changed, 16573 deletions(-) delete mode 100644 Documentation/scsi/aic7xxx_old.txt delete mode 100644 drivers/scsi/aic7xxx_old.c delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx.h delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx.reg delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx.seq delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx_proc.c delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx_reg.h delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx_seq.c delete mode 100644 drivers/scsi/aic7xxx_old/scsi_message.h delete mode 100644 drivers/scsi/aic7xxx_old/sequencer.h diff --git a/Documentation/scsi/00-INDEX b/Documentation/scsi/00-INDEX index 9b0787f..2044be5 100644 --- a/Documentation/scsi/00-INDEX +++ b/Documentation/scsi/00-INDEX @@ -42,8 +42,6 @@ aic79xx.txt - Adaptec Ultra320 SCSI host adapters aic7xxx.txt - info on driver for Adaptec controllers -aic7xxx_old.txt - - info on driver for Adaptec controllers, old generation arcmsr_spec.txt - ARECA FIRMWARE SPEC (for IOP331 adapter) dc395x.txt diff --git a/Documentation/scsi/aic7xxx_old.txt b/Documentation/scsi/aic7xxx_old.txt deleted file mode 100644 index ecfc474..000 diff --git a/MAINTAINERS b/MAINTAINERS index e61c2e8..c79be42 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -470,7 +470,6 @@ M: Hannes Reinecke h...@suse.de L:linux-scsi@vger.kernel.org S:Maintained F:drivers/scsi/aic7xxx/ -F:drivers/scsi/aic7xxx_old/ AIMSLAB FM RADIO RECEIVER DRIVER M:Hans Verkuil hverk...@xs4all.nl diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index fe25677..1f02003 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -499,47 +499,6 @@ config SCSI_AACRAID source drivers/scsi/aic7xxx/Kconfig.aic7xxx - -config SCSI_AIC7XXX_OLD - tristate Adaptec AIC7xxx support (old driver) - depends on (ISA || EISA || PCI ) SCSI - help -WARNING This driver is an older aic7xxx driver and is no longer
Re: [PATCH] drivers/scsi/aic7xxx_old: Convert to generic boolean-values
On Fri, 2007-02-16 at 10:50 -0800, Andrew Morton wrote: Me no understand. If you take the specific example of void ahd_set_syncrate(struct ahd_softc *ahd, struct ahd_devinfo *devinfo, u_int period, u_int offset, u_int ppr_options, u_int type, int paused) then if is crufty, inappropriate and wrong that `paused' is a scalar type. Although you picked a code segment out of the modern aic7xxx, there is a matching similar one in aic7xxx_old. Now, in all fairness, I was at one point playing with a much more preemptable model for that driver that allowed nested pauses, at which point the value of pause would have made sense to be scalar, but that was a *long* time ago. It's just not true or sensible that the code is written so that `paused' can take a value of seventy eight. It _is_ a boolean. It is a truth value. Declaring it as such in the source is all goodness. Passing the value `true' into calls to this function improve readability over passing 1. Hence the reason for the original upper case TRUE/FALSE. I have to admit, I don't really like the lower case true/false, it looks like a variable that can be assigned, thereby changing the implementation of the function call when in fact each calling location is hard coding a constant. But, that's just me and my crufty old C that differentiates between hard coded things and variables via case. So I don't agree with (or understand) your objections. But I can certainly understand reluctance to merge a large-but-minor, do-nothing-much patch into a large and not-very-maintained driver. Hehehe...and here I was thinking of factoring that thing into files and actually bringing it into the current century. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: [Patch] QLogic qla2x00 driver fixes
On Fri, 2005-02-25 at 09:02 -0800, Patrick Mansfield wrote: On Fri, Feb 25, 2005 at 06:57:39AM -0500, Doug Ledford wrote: On Fri, 2005-02-25 at 03:38 -0500, Jeff Garzik wrote: Arjan van de Ven wrote: On Thu, 2005-02-24 at 23:21 -0500, Doug Ledford wrote: Don't use cmd-request-nr_hw_segments as it may not be initialized (SG_IO in particular bypasses anything that initializes this and just uses scsi_do_req to insert a scsi_request directly on the head of the queue) should we fix that in the SG_IO layer ? Possibly/probably. Doug, What kernel did you hit this with? Our RHEL4 kernel (2.6.9 plus bunches-o'-patches) And same question as Doug G: is it via sg (not the block SG_IO)? sg uses scsi_do_req(), block SG_IO doesn't. At the moment, I'm sorting that out. It's a bit of a black box. We have two different loops of commands running on my test machine. There are 10 instances of: while true; do scsiinfo -i /dev/sdf; done which sends an INQUIRY request to the target device. I'm running into a problem getting an strace from this command, so I'm tell you more when I can. The other 10 loops are: while true; do scsi_id -g -s /block/sdf -p 0x80; done Jens sent changes last August or so that fixed SG_IO (not sg) to always set nr_hw_segments, change should be in 2.6.10. It is not obvious that his change fixed this, I can't find the changeset or log. Finding out if this is fixed is only greatly complicated by the fact that there exists a drivers/block/scsi_ioctl.c and drivers/scsi/scsi_ioctl.c and they both are compiled into the kernel, so knowing which one is actually being used, codepath wise, can be confusing. Then on top of that there's drivers/scsi/sg.c with it's SG_IO path. It's not necessarily an easy issue to sort out without that strace and for some reason strace is locking on the test box I'm running. (insert delay as we get the strace issue resolved) OK, the program that's causing the problem, from what I can tell, is opening /dev/sdf and then issuing a SCSI_IOCTL_SEND_COMMAND request. That means we go to scsi/sd.c:sd_ioctl to block/scsi_ioctl.c:scsi_cmd_ioctl to block/scsi_ioctl.c:sg_scsi_ioctl and it's on that path that we aren't always getting an initialized nr_hw_segments. /me votes for consolidating some of this ioctl mess if at all possible. -- Doug Ledford [EMAIL PROTECTED] http://people.redhat.com/dledford - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] QLogic qla2x00 driver fixes
On Fri, 2005-02-25 at 03:38 -0500, Jeff Garzik wrote: Arjan van de Ven wrote: On Thu, 2005-02-24 at 23:21 -0500, Doug Ledford wrote: Don't use cmd-request-nr_hw_segments as it may not be initialized (SG_IO in particular bypasses anything that initializes this and just uses scsi_do_req to insert a scsi_request directly on the head of the queue) should we fix that in the SG_IO layer ? Possibly/probably. I'm not concerned with it personally. The only reason that the scsi layer copies the block layer request struct into the scsi command/request is so that upon completion it has enough information to mark blocks as either up to date or not while at the same time allowing the scsi layer to free the original block request at queue time, not at completion time. It was never intended to be used by low level drivers. And since the scsi layer has several ways of bypassing the block layer to inject commands directly to devices, trying to catch all those injection points and make them emulate block layer activity when we already provide everything a low level driver needs in the scsi command struct is silly. Just use what's available, reliable, always initialized, and intended for you to use (cmd-use_sg) instead of trying to use a roughly duplicate item from a different abstraction layer that isn't always involved in commands sent to a low level driver (request- nr_hw_segments). For SCSI drivers specifically, using nr_hw_segments is pointless unless cmd-use_sg goes away. At which point tons of SCSI drivers want changing anyway. n_sg = dma_map_sg(..., cmd-use_sg, ...) will always do the right thing, when the cmd contains a scatterlist. Deviate from the norm and pay the price, really... Aye, yup. -- Doug Ledford [EMAIL PROTECTED] http://people.redhat.com/dledford - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch] QLogic qla2x00 driver fixes
Don't use cmd-request-nr_hw_segments as it may not be initialized (SG_IO in particular bypasses anything that initializes this and just uses scsi_do_req to insert a scsi_request directly on the head of the queue) and a bogus value here can trip up the checks to make sure that the number of segments will fit in the queue ring buffer, resulting in commands that are never completed. Fix up several issues with PCI DMA mapping and failure to check return values on the mappings. Make the check for space in the ring buffer happen after the DMA mapping is done since any checks done before the mapping has taken place are bogus. -- Doug Ledford [EMAIL PROTECTED] http://people.redhat.com/dledford = qla_iocb.c 1.15 vs edited = --- 1.15/drivers/scsi/qla2xxx/qla_iocb.c 2004-12-09 01:12:03 -05:00 +++ edited/qla_iocb.c 2005-02-24 16:25:55 -05:00 @@ -216,18 +216,7 @@ void qla2x00_build_scsi_iocbs_32(srb_t * cur_seg++; } } else { - dma_addr_t req_dma; - struct page *page; - unsigned long offset; - - page = virt_to_page(cmd-request_buffer); - offset = ((unsigned long)cmd-request_buffer ~PAGE_MASK); - req_dma = pci_map_page(ha-pdev, page, offset, - cmd-request_bufflen, cmd-sc_data_direction); - - sp-dma_handle = req_dma; - - *cur_dsd++ = cpu_to_le32(req_dma); + *cur_dsd++ = cpu_to_le32(sp-dma_handle); *cur_dsd++ = cpu_to_le32(cmd-request_bufflen); } } @@ -299,19 +288,8 @@ void qla2x00_build_scsi_iocbs_64(srb_t * cur_seg++; } } else { - dma_addr_t req_dma; - struct page *page; - unsigned long offset; - - page = virt_to_page(cmd-request_buffer); - offset = ((unsigned long)cmd-request_buffer ~PAGE_MASK); - req_dma = pci_map_page(ha-pdev, page, offset, - cmd-request_bufflen, cmd-sc_data_direction); - - sp-dma_handle = req_dma; - - *cur_dsd++ = cpu_to_le32(LSD(req_dma)); - *cur_dsd++ = cpu_to_le32(MSD(req_dma)); + *cur_dsd++ = cpu_to_le32(LSD(sp-dma_handle)); + *cur_dsd++ = cpu_to_le32(MSD(sp-dma_handle)); *cur_dsd++ = cpu_to_le32(cmd-request_bufflen); } } @@ -344,6 +322,8 @@ qla2x00_start_scsi(srb_t *sp) /* Setup device pointers. */ ret = 0; + /* So we know we haven't pci_map'ed anything yet */ + tot_dsds = 0; fclun = sp-lun_queue-fclun; ha = fclun-fcport-ha; reg = ha-iobase; @@ -372,8 +352,32 @@ qla2x00_start_scsi(srb_t *sp) if (index == MAX_OUTSTANDING_COMMANDS) goto queuing_error; + /* Map the sg table so we have an accurate count of sg entries needed */ + if (cmd-use_sg) { + sg = (struct scatterlist *) cmd-request_buffer; + tot_dsds = pci_map_sg(ha-pdev, sg, cmd-use_sg, + cmd-sc_data_direction); + if (tot_dsds == 0) + goto queuing_error; + } else if (cmd-request_bufflen) { + dma_addr_t req_dma; + struct page *page; + unsigned long offset; + + page = virt_to_page(cmd-request_buffer); + offset = ((unsigned long)cmd-request_buffer ~PAGE_MASK); + req_dma = pci_map_page(ha-pdev, page, offset, + cmd-request_bufflen, cmd-sc_data_direction); + + if (dma_mapping_error(req_dma)) + goto queuing_error; + + sp-dma_handle = req_dma; + tot_dsds = 1; + } + /* Calculate the number of request entries needed. */ - req_cnt = (ha-calc_request_entries)(cmd-request-nr_hw_segments); + req_cnt = (ha-calc_request_entries)(tot_dsds); if (ha-req_q_cnt (req_cnt + 2)) { cnt = RD_REG_WORD_RELAXED(ISP_REQ_Q_OUT(ha, reg)); if (ha-req_ring_index cnt) @@ -385,19 +389,6 @@ qla2x00_start_scsi(srb_t *sp) if (ha-req_q_cnt (req_cnt + 2)) goto queuing_error; - /* Finally, we have enough space, now perform mappings. */ - tot_dsds = 0; - if (cmd-use_sg) { - sg = (struct scatterlist *) cmd-request_buffer; - tot_dsds = pci_map_sg(ha-pdev, sg, cmd-use_sg, - cmd-sc_data_direction); - if (tot_dsds == 0) - goto queuing_error; - } else if (cmd-request_bufflen) { - tot_dsds++; - } - req_cnt = (ha-calc_request_entries)(tot_dsds); - /* Build command packet */ ha-current_outstanding_cmd = handle; ha-outstanding_cmds[handle] = sp; @@ -479,6 +470,14 @@ qla2x00_start_scsi(srb_t *sp) return (QLA_SUCCESS); queuing_error: + if (cmd-use_sg tot_dsds) { + sg = (struct scatterlist *) cmd-request_buffer; + pci_unmap_sg(ha-pdev, sg, cmd-use_sg, + cmd-sc_data_direction); + } else if (tot_dsds) { + pci_unmap_page(ha-pdev, sp-dma_handle, cmd-request_bufflen, + cmd-sc_data_direction); + } spin_unlock_irqrestore(ha-hardware_lock, flags); return (QLA_FUNCTION_FAILED);
Re: question on io_request_lock
MEHTA,HIREN (A-SanJose,ex1) wrote: Well, I guess the problem is that : scsi layer does not know whether the hba driver will do its own locking and release io_request_lock or not. Before calling any entry point, scsi layer is anyway acquires io_request_lock. Also, if you are depending on the io_request_lock, then in the interrupt handler you are supposed to acquired the io_request_lock and release it before returning. So, when the hba driver calls the done routine, the io_request_lock is already locked and doing it again in the done() will freeze the system. Exactly. However, if you use the new error handling code, then you *can* get away with not grabbing the io_request_lock for the done function (I believe, Eric will correct me if I'm wrong) because it uses a different spin lock to control the queue it puts the commands into for the bottom half handler. As such, it doesn't really do anything that needs the io_request_lock. The old error handling code is a whole different ball of wax. Call that done function without holding the io_request_lock and you are bound to blow the SCSI layer to hell. -hiren -Original Message- From: Martin Peschke3 [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 19, 2001 12:23 PM To: Doug Ledford Cc: MEHTA,HIREN (A-SanJose,ex1); '[EMAIL PROTECTED]' Subject: Re: question on io_request_lock Yes, the done() function needs to be wrapped (this isn't so much because it actually needs it as it is that you are calling mid layer code and you need to adhere to what it tells you locking semantics are, which in this case is always hold the io_request_lock because that's how I keep myself sane). Are you sure about that? I think, there are low-level drivers which don't follow this rule. Why does not the scsi_done function(s) itself get the io_request_lock first before doing any work if it really needs this? Mit freundlichen Grüßen / with kind regards Martin Peschke IBM Deutschland Entwicklung GmbH GNU/Linux for S/390 and zSeries Development Phone: +49-(0)7031-16-2349 Doug Ledford [EMAIL PROTECTED] on 07/19/2001 06:48:45 PM Please respond to Doug Ledford [EMAIL PROTECTED] To: MEHTA,HIREN (A-SanJose,ex1) [EMAIL PROTECTED] cc: '[EMAIL PROTECTED]' [EMAIL PROTECTED] Subject: Re: question on io_request_lock MEHTA,HIREN (A-SanJose,ex1) wrote: Hi List, If I decide that I do not want to depend on io_request_lock to maintain the consistency of the data structures in my driver, then do I ever need to acquire this lock and release it in the driver ? e.g. Do I need to acquire io_request_lock before I call the done() routine and release it after done() returns ? Yes, the done() function needs to be wrapped (this isn't so much because it actually needs it as it is that you are calling mid layer code and you need to adhere to what it tells you locking semantics are, which in this case is always hold the io_request_lock because that's how I keep myself sane). You are free to use your own internal spin locks in your code if you wish. For certain of your code entry points, you may wish to release the io_request_lock and then re-grab it before returning. For example, at the beginning of your queue routine you should release the io_request_lock and then regrab it before returning if you want to use your own internal locking and not use the io_request_lock. The queue, abort, and reset routines are called with the io_request_lock held when using the old error handling methods. I don't know which entry points are called with the lock held under the new error handling code, but I would suspect the answer is all of them. -- Doug Ledford [EMAIL PROTECTED] http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] -- Doug Ledford [EMAIL PROTECTED] http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED]
Re: Stop rewind in MT command?
Alan Dayley wrote: I am using mt-st v. 0.5b. At the end of EVERY MT command, a rewind is issued. What good it a seek to end of data (seod) if it is immeadiately rewound before another command can be issued? Is there an option or setting somewhere to stop the rewind command at the end of MT execution? man st FILES /dev/st* : the auto-rewind SCSI tape devices /dev/nst* : the non-rewind SCSI tape devices This is true of all tape devices. The ones that are preceeded with an 'n' are the non-rewind version, and the ones that aren't automatically rewind when the tape device is closed. So, the problem isn't with mt, it's doing exactly what it is suppossed to do. The problem is with the device you are accessing the tape through. -- Doug Ledford [EMAIL PROTECTED] http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
Eric Z. Ayers wrote: Doug Ledford writes: (James Bottomley commented about the need for SCSI reservation kernel patches) I agree. It's something that needs fixed in general, your software needs it as well, and I've written (about 80% done at this point) some open source software geared towards getting/holding reservations that also requires the same kernel patches (plus one more to be fully functional, an ioctl to allow a SCSI reservation to do a forced reboot of a machine). I'll be releasing that package in the short term (once I get back from my vacation anyway). Hello Doug, Does this package also tell the kernel to re-establish a reservation for all devices after a bus reset, or at least inform a user level program? Finding out when there has been a bus reset has been a stumbling block for me. It doesn't have to. The kernel changes are minimal (basically James' SCSI reset patch that he's been carrying around, the scsi reservation conflict patch, and I need to write a third patch that makes the system optionally reboot immediately on a reservation conflict and which is controlled by an ioctl, but I haven't done that patch yet). All of the rest is implemented in user space via the /dev/sg entries. As such, it doesn't have any more information about bus resets than you do. However, because of the policy enacted in the code, it doesn't need to. Furthermore, because there are so many ways to loose a reservation silently, it's foolhardy to try and keep reservation consistency any way other than something similar to what I outline below. The package is meant to be a sort of scsi reservation library. The application that uses the library is responsible for setting policy. I wrote a small, simple application that actually does a decent job of implementing policy on the system. The policy it does implement is simple: If told to get a reservation, then attempt to get it. If the attempt is blocked by an existing reservation and we aren't suppossed to reset the drive, then exit. If it's blocked and we are suppossed to reset the drive, then send a device reset, then wait 5 seconds, then try to get the reservation. If we again fail, then the other machine is still alive (as proven by the fact that it re-established its reservation after the reset) and we exit, else we now have the reservation. If told to forcefully get a reservation, then attempt to get it. If the attempt fails, then reset the device and try again immediately (no 5 second wait), if it fails again, then exit. If told to hold a reservation, then resend your reservation request once every 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a deal as requesting a reservation every 2 seconds might sound). The first time the reservation is refused, consider the reservation stolen by another machine and exit (or optionally, reboot). The package is meant to lock against itself (in other words, a malicious user with write access to the /dev/sg entries could confuse this locking mechanism, but it will work cooperatively with other copies of itself running on other machines), the requirements for the locking to be safe are as follows: 1) A machine is not allowed to mount or otherwise use a drive in any way shape or form until it has successfully acquired a reservation. 2) Once a machine has a reservation, it is not allowed to ever take any action to break another machines reservation, so that if the reservation is stolen, this machine is required to gracefully step away from the drive (rebooting is the best way to accomplish this since even the act of unmounting the drive will attempt to write to it). 3) The timeouts in the program must be honored (resend your reservation, when you hold it, every 2 seconds so that a passive attempt to steal the reservation will see you are still alive within the 5 second timeout and leave you be, which is a sort of heartbeat in and of itself). Anyway, as I said in my previous email, it's about 80% complete. It currently is up and running on SCSI-2 LUN based reservations. There is code to do SCSI-2 and SCSI-3 extent based reservations but it hasn't been tested due to lack of devices that support extent based reservations (my test bed is a multipath FC setup, so I'm doing all my testing on FC drives over two FC controllers in the same machine). I've still got to add the SCSI-3 Persistent Reservation code to the library (again, I'm lacking test drives for this scenario). The library itself requires that the program treat all reservations as extent/persistent reservations and it silently falls back to LUN reservations when neither of those two are available. My simple program that goes with the application just makes extent reservations of the whole disc, so it acts like a LUN reservation regardless, but there is considerably more flexibility in the library if a person wishes to program to it. -- Doug Ledford [EMAIL PROTECTED] http
Re: QUEUE_FULL + use_new_eh_code = bad_karma?
Eric Youngdale wrote: Well hosts that don't use the new EH code don't handle QUEUE_FULL. Not quite sure what would happen with it, actually. If everything goes according to plan, when the device returns QUEUE_FULL, the io should get stuck back in the queue to be handled later. In addition a flag should be set to prevent further I/O from being sent to that device - that flag should be cleared when one of the existing I/O requests actually completes. I am not completely sure how best to debug this. I guess the first thing to do is to see if in fact the QUEUE_FULL is being handled as I described and the flag is getting set. Then the next thing to look for is to see what happens when some other I/O finishes. Please note that while another command completing is enough to trigger the next command to be sent, there are cases when we can get a QUEUE_FULL with 0 commands active on the device, resulting in a hung drive if you don't implement a timeout for the paused queue. -- Doug Ledford [EMAIL PROTECTED] http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: 160MB/s drive detected as 80MB/s
Simon Garner wrote: Hi, I have a RH7 system which I have just compiled kernel-2.4.2 for. The box has an Asus CUV4X-D motherboard, and an Adaptec 29160 PCI SCSI controller. I'm using the AIC7xxx driver. Everything works, except the hard drive is a Quantum Atlas 10K2, 160MB/sec, but is only detected as 80MB/sec, as follows: # dmesg ... SCSI subsystem driver Revision: 1.00 (scsi0) Adaptec AIC-7892 Ultra 160/m SCSI host adapter found at PCI 0/11/0 (scsi0) Wide Channel, SCSI ID=7, 32/255 SCBs (scsi0) Downloading sequencer code... 392 instructions downloaded scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.2.1/5.2.0 Adaptec AIC-7892 Ultra 160/m SCSI host adapter (scsi0:0:6:0) ***Synchronous at 80.0 Mbyte/sec***, offset 127. Vendor: QUANTUM Model: ATLAS10K2-TY184L Rev: DA40 Type: Direct-Access ANSI SCSI revision: 03 Detected scsi disk sda at scsi0, channel 0, id 6, lun 0 SCSI device sda: 35860910 512-byte hdwr sectors (18361 MB) ... (My asterisks.) Is this wrong, and can it be fixed, or am I just misunderstanding? The 5.2.4 aic7xxx driver fixes this (patches at my web site, also it is in the ac series kernels) or you can use Justin Gibbs' new aic7xxx driver (which never had this issue in the first place) which happens to be the new default aic7xxx driver in the 2.4.3 kernel. -- Doug Ledford [EMAIL PROTECTED] http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: scsi_scan problem.
Ishikawa wrote: Hi, I have an "old" Nakamichi CD changer. ("old" might be important consideration here. ) Should I test the patch submitted and report what I found ? (Or maybe I don't have to bother at this stage at all and simply wait for the 2.5 development and debugging cycle?) It would still be helpful because this problem has to be fixed before 2.5. The only question is whether to fix it with a simple patch such as I just submitted, or a more complex patch that uses REPORT LUNs. Part of that answer is how my simple patch works on your device. -- Doug Ledford [EMAIL PROTECTED] http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: scsi_scan problem.
Pete Zaitcev wrote: Date: Wed, 14 Mar 2001 21:28:14 -0500 From: Doug Ledford [EMAIL PROTECTED] A bug report I was charged with fixing (qla2x00 driver doesn't see all luns or sees multiple identical luns in different scenarios) was not a bug in the qla2x00 driver. [...] The bug is that we were detecting offline devices and linking them into the device list. Why is this a bug? What would happen when I telnet into the the RAID box and enable my volumes on those LUNs? Then they would be legitimate devices and you would do: echo "scsi-add-single-device a b c d" /proc/scsi/scsi to add them to the device chain. Before then, they aren't anything (at least not in the case of the Clariion array). But, some devices (at least the Clariion raid chassis) report luns that don't currently have any device bound to them as present but offline. This meant if we truly scanned all luns then we got something like 100+ devices on one ID from this chassis when only 1 might be valid:-( 16384 LUNs for Fibre Channel. As you see, scanning is out of the question. You must issue REPORT LUNs and fall back on scanning if the device reports a check condition. I did that when I worked in Sun Storage with A5000/A3500/T3 arrays couple of years ago. Patches welcomed. The one I sent already works on a fiber channel setup (the qla2x00 in question is fc and so is the Clariion array it's connected to, no detrimental side effects from scanning the box) and so I'm not inclined to add a REPORT LUNs section to the code unless absolutely necessary. -- Doug Ledford [EMAIL PROTECTED] http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: scsi_scan problem.
Doug Ledford wrote: Patches welcomed. The one I sent already works on a fiber channel setup (the qla2x00 in question is fc and so is the Clariion array it's connected to, no detrimental side effects from scanning the box) and so I'm not inclined to add a REPORT LUNs section to the code unless absolutely necessary. Clarification, I think REPORT LUNS support is a 2.5 thing, not a stick it in now thing. -- Doug Ledford [EMAIL PROTECTED] http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]