[PATCH] qla2xxx: Fix warnings reported by static checker.
From: Quinn TranFix following warning reported by static checker drivers/scsi/qla2xxx/qla_init.c:3910 qla2x00_alloc_fcport() warn: use 'flags' here instead of GFP_XXX? Fixes: b79414e ("qla2xxx: Add framework for async fabric discovery") Reported-by: Dan Carpenter Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Cc: Dan Carpenter --- drivers/scsi/qla2xxx/qla_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c index ea6ddcf..1598e84 100644 --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -3878,7 +3878,7 @@ static void qla2xxx_nvram_wwn_from_ofw(scsi_qla_host_t *vha, nvram_t *nv) fcport->ct_desc.ct_sns = dma_alloc_coherent(>hw->pdev->dev, sizeof(struct ct_sns_pkt), >ct_desc.ct_sns_dma, - GFP_ATOMIC); + flags); fcport->disc_state = DSC_DELETED; fcport->fw_login_state = DSC_LS_PORT_UNAVAIL; fcport->deleted = QLA_SESS_DELETED; -- 1.8.3.1
RE: [PATCH] scsi: aacraid: fix information leak on hbainfo.driver_name
Hello Colin, > -Original Message- > From: Colin King [mailto:colin.k...@canonical.com] > Sent: Tuesday, February 7, 2017 5:55 AM > To: dl-esc-Aacraid Linux Driver; James E . J . > Bottomley ; Martin K . Petersen > ; linux-scsi@vger.kernel.org > Cc: kernel-janit...@vger.kernel.org; linux-ker...@vger.kernel.org > Subject: [PATCH] scsi: aacraid: fix information leak on hbainfo.driver_name > > EXTERNAL EMAIL > > > From: Colin Ian King > > The driver_name field is not initialized and hence information > on the stack is being leaked to userspace on the copy_to_user. > Fix this. I am curious, do you mean that the user will be able to retrieve garbage stack values from the variables that were not set (driver_name etc)? . If so how is it a security threat? Regards, Raghava Aditya > Signed-off-by: Colin Ian King > --- > drivers/scsi/aacraid/commctrl.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c > index 614842a..eb48d0a 100644 > --- a/drivers/scsi/aacraid/commctrl.c > +++ b/drivers/scsi/aacraid/commctrl.c > @@ -1015,7 +1015,7 @@ static int aac_get_pci_info(struct aac_dev* dev, > void __user *arg) > > static int aac_get_hba_info(struct aac_dev *dev, void __user *arg) > { > - struct aac_hba_info hbainfo; > + struct aac_hba_info hbainfo = { 0 }; > > hbainfo.adapter_number = (u8) dev->id; > hbainfo.system_io_bus_number= dev->pdev->bus->number; > -- > 2.10.2
Re: [PATCH] scsi: lpfc: Add shutdown method for kexec
On 02/12/2017 07:01 PM, Benjamin Herrenschmidt wrote: > On Mon, 2017-02-13 at 13:21 +1300, Eric W. Biederman wrote: >>> Good point, at the very least we should call remove if shutdown doesn't >>> exist. Eric: could we make the changes Ben suggests? >> >> Definitely. That was the original design of the kexec interface >> but people were worried about calling remove during reboot might >> introduce regressions on the reboot functionality. So shutdown was >> added to be remove without the cleaning up the kernel data structures. > > Right. Part of the problem was also that remove was dependent on > CONFIG_HOTPLUG > though that's no longer the case anymore. > > The problem is that a bunch of drivers either don't have a shutdown or > worse, have one that actually shuts the HW down rather than "idle" it > which puts it in a state that the new kernel can't deal with. If we do transition to use remove rather than shutdown, I think we want some way for a device driver to know whether we are doing kexec or not. A RAID adapter with a write cache is going to want to flush its write cache on a PCI hotplug remove, but for a kexec, its going to want to skip that so the kexec is faster. Today, since kexec looks like a reboot, rather than a shutdown, we can skip the flush on a reboot, since its technically not needed there either. -Brian -- Brian King Power Linux I/O IBM Linux Technology Center
Re: [PATCH] scsi: lpfc: Add shutdown method for kexec
On Tue, 2017-02-14 at 15:45 +1300, Eric W. Biederman wrote: > The only difference ever that should exist between shutdown and remove > is do you clean up kernel data structures. The shutdown method is > allowed to skip the cleanup up kernel data structures that the remove > method needs to make. > > Assuming the kernel does not have corrupted data structures I don't see > any practical reason for distinguishing between the two. There may be > some real world gotchas we have to deal with but semantically I don't > see any concerns. We had historical additions to shutdown actually shutting things down, like spinning platters down to park disk heads, switching devices into D3 on some systems, etc... that remove never had (and we don't want for kexec). Cheers, Ben.
[PATCH] qedf: fix ifnullfree.cocci warnings
drivers/scsi/qedf/qedf_main.c:2742:2-7: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values. NULL check before some freeing functions is not needed. Based on checkpatch warning "kfree(NULL) is safe this check is probably not required" and kfreeaddr.cocci by Julia Lawall. Generated by: scripts/coccinelle/free/ifnullfree.cocci CC: Dupuis, ChadSigned-off-by: Fengguang Wu --- qedf_main.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/drivers/scsi/qedf/qedf_main.c +++ b/drivers/scsi/qedf/qedf_main.c @@ -2738,8 +2738,7 @@ static void qedf_free_fcoe_pf_param(stru qedf_free_global_queues(qedf); - if (qedf->global_queues) - kfree(qedf->global_queues); + kfree(qedf->global_queues); } /*
Re: [PATCH V4 2/2] qedf: Add QLogic FastLinQ offload FCoE driver framework.
Hi Chad, [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Dupuis-Chad/qed-Add-support-for-hardware-offloaded-FCoE/20170214-040339 coccinelle warnings: (new ones prefixed by >>) >> drivers/scsi/qedf/qedf_main.c:2742:2-7: WARNING: NULL check before freeing >> functions like kfree, debugfs_remove, debugfs_remove_recursive or >> usb_free_urb is not needed. Maybe consider reorganizing relevant code to >> avoid passing NULL values. Please review and possibly fold the followup patch. --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: [PATCH] scsi: lpfc: Add shutdown method for kexec
On Mon, 2017-02-13 at 15:57 -0600, Brian King wrote: > If we do transition to use remove rather than shutdown, I think we > want > some way for a device driver to know whether we are doing kexec or > not. > A RAID adapter with a write cache is going to want to flush its write > cache on a PCI hotplug remove, but for a kexec, its going to want to > skip > that so the kexec is faster. Today, since kexec looks like a reboot, > rather than a shutdown, we can skip the flush on a reboot, since its > technically not needed there either. What happens if a non-flushed adapter gets a PERST ? I wouldn't trust that 'don't have to flush' magic ... Cheers, Ben.
Re: [PATCH 1/1] mpt3sas: Ignore unaligned completion length for ZBC_IN
Bart, On 2/14/17 03:57, Bart Van Assche wrote: > On Mon, 2017-02-13 at 14:11 +0900, Damien Le Moal wrote: >> The ZBC_IN command (REPORT ZONES) reply length is always a multiple of >> 64B and thus may not be aligned on the device LBA size. >> For this command, retry due to the unaligned completion length is >> incorrect so do not check alignment of the reply length. >> >> Signed-off-by: Damien Le Moal>> --- >> drivers/scsi/mpt3sas/mpt3sas_scsih.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c >> b/drivers/scsi/mpt3sas/mpt3sas_scsih.c >> index 0b5b423..f04b872 100644 >> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c >> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c >> @@ -4723,8 +4723,9 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, >> u8 msix_index, u32 reply) >> * then scsi-ml does not need to handle this misbehavior. >> */ >> sector_sz = scmd->device->sector_size; >> -if (unlikely(scmd->request->cmd_type == REQ_TYPE_FS && sector_sz && >> - xfer_cnt % sector_sz)) { >> +if (scmd->request->cmd_type == REQ_TYPE_FS && >> +scmd->cmnd[0] != ZBC_IN && >> +sector_sz && xfer_cnt % sector_sz) { >> sdev_printk(KERN_INFO, scmd->device, >> "unaligned partial completion avoided (xfer_cnt=%u, >> sector_sz=%u)\n", >> xfer_cnt, sector_sz); > > Hello Damien, > > What software generated a ZBC_IN request with type REQ_TYPE_FS? Shouldn't > all ZBC_IN requests be assigned type REQ_TYPE_BLOCK_PC? Any application that issues a BLKREPORTZONE ioctl (e.g. mkfs.f2fs or libzbc tools) or kernel component calling blkdev_report_zones (e.g. f2fs) will generate one or more REQ_OP_ZONE_REPORT BIO which translate into a REQ_TYPE_FS requests for the command ZBC_IN. That command does not operate on LBAs. The transfer length of the request can be any length larger than 64B and the reply length will be at least 64B and aligned on a multiple of 64. The patch "mpt3sas: Force request partial completion alignment" applied last week forgot this case, and frankly, I did too despite having looked at it. Martin, We really need this fix to go into 4.10. Without it, the REPORT ZONES/ZBC_IN command can be retried forever by the HBA. Best regards. -- Damien Le Moal, Ph.D. Sr. Manager, System Software Research Group, Western Digital Corporation damien.lem...@wdc.com (+81) 0466-98-3593 (ext. 513593) 1 kirihara-cho, Fujisawa, Kanagawa, 252-0888 Japan www.wdc.com, www.hgst.com
Re: [PATCH 1/1] mpt3sas: Ignore unaligned completion length for ZBC_IN
On Tue, 2017-02-14 at 12:45 +0900, Damien Le Moal wrote: > On 2/14/17 03:57, Bart Van Assche wrote: > > On Mon, 2017-02-13 at 14:11 +0900, Damien Le Moal wrote: > > > The ZBC_IN command (REPORT ZONES) reply length is always a multiple of > > > 64B and thus may not be aligned on the device LBA size. > > > For this command, retry due to the unaligned completion length is > > > incorrect so do not check alignment of the reply length. > > > > > > Signed-off-by: Damien Le Moal> > > --- > > > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 5 +++-- > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > > b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > > index 0b5b423..f04b872 100644 > > > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > > @@ -4723,8 +4723,9 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 > > > smid, u8 msix_index, u32 reply) > > >* then scsi-ml does not need to handle this misbehavior. > > >*/ > > > sector_sz = scmd->device->sector_size; > > > - if (unlikely(scmd->request->cmd_type == REQ_TYPE_FS && sector_sz && > > > - xfer_cnt % sector_sz)) { > > > + if (scmd->request->cmd_type == REQ_TYPE_FS && > > > + scmd->cmnd[0] != ZBC_IN && > > > + sector_sz && xfer_cnt % sector_sz) { > > > sdev_printk(KERN_INFO, scmd->device, > > > "unaligned partial completion avoided (xfer_cnt=%u, > > > sector_sz=%u)\n", > > > xfer_cnt, sector_sz); > > > > What software generated a ZBC_IN request with type REQ_TYPE_FS? Shouldn't > > all ZBC_IN requests be assigned type REQ_TYPE_BLOCK_PC? > > Any application that issues a BLKREPORTZONE ioctl (e.g. mkfs.f2fs or > libzbc tools) or kernel component calling blkdev_report_zones (e.g. > f2fs) will generate one or more REQ_OP_ZONE_REPORT BIO which translate > into a REQ_TYPE_FS requests for the command ZBC_IN. That command does > not operate on LBAs. The transfer length of the request can be any > length larger than 64B and the reply length will be at least 64B and > aligned on a multiple of 64. > > The patch "mpt3sas: Force request partial completion alignment" applied > last week forgot this case, and frankly, I did too despite having looked > at it. Hello Damien, How about modifying the block layer core such that it sets REQ_TYPE_BLOCK_PC for REQ_OP_ZONE_REPORT requests? Would that be a valid alternative to the above patch? Thanks, Bart.
Re: [PATCH] scsi: lpfc: Add shutdown method for kexec
Brian Kingwrites: > On 02/12/2017 07:01 PM, Benjamin Herrenschmidt wrote: >> On Mon, 2017-02-13 at 13:21 +1300, Eric W. Biederman wrote: Good point, at the very least we should call remove if shutdown doesn't exist. Eric: could we make the changes Ben suggests? >>> >>> Definitely. That was the original design of the kexec interface >>> but people were worried about calling remove during reboot might >>> introduce regressions on the reboot functionality. So shutdown was >>> added to be remove without the cleaning up the kernel data structures. >> >> Right. Part of the problem was also that remove was dependent on >> CONFIG_HOTPLUG >> though that's no longer the case anymore. >> >> The problem is that a bunch of drivers either don't have a shutdown or >> worse, have one that actually shuts the HW down rather than "idle" it >> which puts it in a state that the new kernel can't deal with. > > If we do transition to use remove rather than shutdown, I think we want > some way for a device driver to know whether we are doing kexec or not. > A RAID adapter with a write cache is going to want to flush its write > cache on a PCI hotplug remove, but for a kexec, its going to want to skip > that so the kexec is faster. Today, since kexec looks like a reboot, > rather than a shutdown, we can skip the flush on a reboot, since its > technically not needed there either. This doesn't make any sense. With a PCI hotplug remove you can't do anything because by the time you get the event the hardware is gone. I think there are optional interlocks but nothing as good as the old 3.5" floppy disk button, or the button on CD drives. The only difference ever that should exist between shutdown and remove is do you clean up kernel data structures. The shutdown method is allowed to skip the cleanup up kernel data structures that the remove method needs to make. Assuming the kernel does not have corrupted data structures I don't see any practical reason for distinguishing between the two. There may be some real world gotchas we have to deal with but semantically I don't see any concerns. Eric
Re: [PATCH] cdrom: Make device operations read-only
From: Kees CookDate: Mon, 13 Feb 2017 16:25:26 -0800 > diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c > index 9cbd217bc0c9..ab9232e1e16f 100644 > --- a/drivers/ide/ide-cd.c > +++ b/drivers/ide/ide-cd.c > @@ -1166,7 +1166,7 @@ void ide_cdrom_update_speed(ide_drive_t *drive, u8 *buf) >CDC_CD_RW | CDC_DVD | CDC_DVD_R | CDC_DVD_RAM | CDC_GENERIC_PACKET | \ >CDC_MO_DRIVE | CDC_MRW | CDC_MRW_W | CDC_RAM) > > -static struct cdrom_device_ops ide_cdrom_dops = { > +static const struct cdrom_device_ops ide_cdrom_dops = { > .open = ide_cdrom_open_real, > .release= ide_cdrom_release_real, > .drive_status = ide_cdrom_drive_status, Acked-by: David S. Miller
[PATCH] cdrom: Make device operations read-only
Since function tables are a common target for attackers, it's best to keep them in read-only memory. As such, this makes the CDROM device ops tables const. This drops additionally n_minors, since it isn't used meaningfully, and sets the only user of cdrom_dummy_generic_packet explicitly so the variables can all be const. Inspired by similar changes in grsecurity/PaX. Signed-off-by: Kees Cook--- Documentation/cdrom/cdrom-standard.tex | 9 +- drivers/block/paride/pcd.c | 2 +- drivers/cdrom/cdrom.c | 58 -- drivers/cdrom/gdrom.c | 4 +-- drivers/ide/ide-cd.c | 2 +- drivers/scsi/sr.c | 2 +- include/linux/cdrom.h | 5 +-- 7 files changed, 37 insertions(+), 45 deletions(-) diff --git a/Documentation/cdrom/cdrom-standard.tex b/Documentation/cdrom/cdrom-standard.tex index c06233fe52ac..8f85b0e41046 100644 --- a/Documentation/cdrom/cdrom-standard.tex +++ b/Documentation/cdrom/cdrom-standard.tex @@ -249,7 +249,6 @@ struct& cdrom_device_ops\ \{ \hidewidth\cr unsigned\ long);\cr \noalign{\medskip} \ int& capability;& capability flags \cr - & n_minors;& number of active minor devices \cr \};\cr } $$ @@ -258,13 +257,7 @@ it should add a function pointer to this $struct$. When a particular function is not implemented, however, this $struct$ should contain a NULL instead. The $capability$ flags specify the capabilities of the \cdrom\ hardware and/or low-level \cdrom\ driver when a \cdrom\ drive -is registered with the \UCD. The value $n_minors$ should be a positive -value indicating the number of minor devices that are supported by -the low-level device driver, normally~1. Although these two variables -are `informative' rather than `operational,' they are included in -$cdrom_device_ops$ because they describe the capability of the {\em -driver\/} rather than the {\em drive}. Nomenclature has always been -difficult in computer programming. +is registered with the \UCD. Note that most functions have fewer parameters than their $blkdev_fops$ counterparts. This is because very little of the diff --git a/drivers/block/paride/pcd.c b/drivers/block/paride/pcd.c index 5fd2d0e25567..10aed84244f5 100644 --- a/drivers/block/paride/pcd.c +++ b/drivers/block/paride/pcd.c @@ -273,7 +273,7 @@ static const struct block_device_operations pcd_bdops = { .check_events = pcd_block_check_events, }; -static struct cdrom_device_ops pcd_dops = { +static const struct cdrom_device_ops pcd_dops = { .open = pcd_open, .release= pcd_release, .drive_status = pcd_drive_status, diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index 59cca72647a6..bbbd3caa927c 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -342,8 +342,8 @@ static void cdrom_sysctl_register(void); static LIST_HEAD(cdrom_list); -static int cdrom_dummy_generic_packet(struct cdrom_device_info *cdi, - struct packet_command *cgc) +int cdrom_dummy_generic_packet(struct cdrom_device_info *cdi, + struct packet_command *cgc) { if (cgc->sense) { cgc->sense->sense_key = 0x05; @@ -354,6 +354,7 @@ static int cdrom_dummy_generic_packet(struct cdrom_device_info *cdi, cgc->stat = -EIO; return -EIO; } +EXPORT_SYMBOL(cdrom_dummy_generic_packet); static int cdrom_flush_cache(struct cdrom_device_info *cdi) { @@ -371,7 +372,7 @@ static int cdrom_flush_cache(struct cdrom_device_info *cdi) static int cdrom_get_disc_info(struct cdrom_device_info *cdi, disc_information *di) { - struct cdrom_device_ops *cdo = cdi->ops; + const struct cdrom_device_ops *cdo = cdi->ops; struct packet_command cgc; int ret, buflen; @@ -586,7 +587,7 @@ static int cdrom_mrw_set_lba_space(struct cdrom_device_info *cdi, int space) int register_cdrom(struct cdrom_device_info *cdi) { static char banner_printed; - struct cdrom_device_ops *cdo = cdi->ops; + const struct cdrom_device_ops *cdo = cdi->ops; int *change_capability = (int *)>capability; /* hack */ cd_dbg(CD_OPEN, "entering register_cdrom\n"); @@ -610,7 +611,6 @@ int register_cdrom(struct cdrom_device_info *cdi) ENSURE(reset, CDC_RESET); ENSURE(generic_packet, CDC_GENERIC_PACKET); cdi->mc_flags = 0; - cdo->n_minors = 0; cdi->options = CDO_USE_FFLAGS; if (autoclose == 1 && CDROM_CAN(CDC_CLOSE_TRAY)) @@ -630,8 +630,7 @@ int register_cdrom(struct cdrom_device_info *cdi) else cdi->cdda_method = CDDA_OLD; - if (!cdo->generic_packet) - cdo->generic_packet = cdrom_dummy_generic_packet; + WARN_ON(!cdo->generic_packet); cd_dbg(CD_REG_UNREG, "drive \"/dev/%s\" registered\n", cdi->name);
Re: [PATCH 1/1] mpt3sas: Ignore unaligned completion length for ZBC_IN
On Tue, Feb 14, 2017 at 02:21:37PM +0900, Damien Le Moal wrote: > > I think we want to keep the knowledge of which requests have a request size > > that should be a multiple of the logical block size in the block layer core > > or in the SCSI core but not in the mpt3sas driver. But I'm not sure what the > > best approach is to do that. Should we use the request type, should we add a > > new request attribute or should we add a new function? > > I agree. But the mpt3sas patch that introduced the problem is to solve > problems with a buggy hardware in the first place... A quirck of some > sort. So do we really need such a big change just the report zones > exception ? (all other REQ_TYPE_FS commands either have no payload or > the payload size is LBA aligned) > > Martin, James, Hannes, > > Please advise ! 4.10 is introducing zoned block device support, and from > the first release that will be broken with mpt3sas HBAs if we do not > patch this. Any preferred approach ? I suspect we'll need to move the workaround to the SD driver. While it's an mpt bug, sd is the party that knows which kind of requests were generated, so it's where we can fix up the length in the done callback.
Re: [PATCH 1/1] mpt3sas: Ignore unaligned completion length for ZBC_IN
Bart, On 2/14/17 12:59, Bart Van Assche wrote: >>> What software generated a ZBC_IN request with type REQ_TYPE_FS? Shouldn't >>> all ZBC_IN requests be assigned type REQ_TYPE_BLOCK_PC? >> >> Any application that issues a BLKREPORTZONE ioctl (e.g. mkfs.f2fs or >> libzbc tools) or kernel component calling blkdev_report_zones (e.g. >> f2fs) will generate one or more REQ_OP_ZONE_REPORT BIO which translate >> into a REQ_TYPE_FS requests for the command ZBC_IN. That command does >> not operate on LBAs. The transfer length of the request can be any >> length larger than 64B and the reply length will be at least 64B and >> aligned on a multiple of 64. >> >> The patch "mpt3sas: Force request partial completion alignment" applied >> last week forgot this case, and frankly, I did too despite having looked >> at it. > > Hello Damien, > > How about modifying the block layer core such that it sets REQ_TYPE_BLOCK_PC > for REQ_OP_ZONE_REPORT requests? Would that be a valid alternative to the > above patch? I think so. But my understanding of REQ_TYPE_BLOCK_PC is that it is the type for requests issued internally (scsi_execute) of from things like the SG driver, so in essence, all requests not derived from a BIO... Is this correct ? If yes, then setting the BLOCK_TYPE_PC for REQ_OP_ZONE_REPORT (and REQ_OP_ZONE_RESET while at it) would break this model, wouldn't it ? Best regards. -- Damien Le Moal, Ph.D. Sr. Manager, System Software Research Group, Western Digital Corporation damien.lem...@wdc.com (+81) 0466-98-3593 (ext. 513593) 1 kirihara-cho, Fujisawa, Kanagawa, 252-0888 Japan www.wdc.com, www.hgst.com
Re: [PATCH 1/1] mpt3sas: Ignore unaligned completion length for ZBC_IN
Christoph, On 2/14/17 15:18, h...@lst.de wrote: > On Tue, Feb 14, 2017 at 02:21:37PM +0900, Damien Le Moal wrote: >>> I think we want to keep the knowledge of which requests have a request size >>> that should be a multiple of the logical block size in the block layer core >>> or in the SCSI core but not in the mpt3sas driver. But I'm not sure what the >>> best approach is to do that. Should we use the request type, should we add a >>> new request attribute or should we add a new function? >> >> I agree. But the mpt3sas patch that introduced the problem is to solve >> problems with a buggy hardware in the first place... A quirck of some >> sort. So do we really need such a big change just the report zones >> exception ? (all other REQ_TYPE_FS commands either have no payload or >> the payload size is LBA aligned) >> >> Martin, James, Hannes, >> >> Please advise ! 4.10 is introducing zoned block device support, and from >> the first release that will be broken with mpt3sas HBAs if we do not >> patch this. Any preferred approach ? > > I suspect we'll need to move the workaround to the SD driver. While it's > an mpt bug, sd is the party that knows which kind of requests were > generated, so it's where we can fix up the length in the done callback. This makes sense. sd_done already sets report zone command resid to 0, always. So it make sense to have the mpt3sas check after that, written in a generic manner. I will send a patch shortly. Best regards. -- Damien Le Moal, Ph.D. Sr. Manager, System Software Research Group, Western Digital Corporation damien.lem...@wdc.com (+81) 0466-98-3593 (ext. 513593) 1 kirihara-cho, Fujisawa, Kanagawa, 252-0888 Japan www.wdc.com, www.hgst.com
[bug report] scsi: ufs-qcom: dump additional testbus registers
Hello Venkat Gopalakrishnan, The patch 9c46b8676271: "scsi: ufs-qcom: dump additional testbus registers" from Feb 3, 2017, leads to the following static checker warning: drivers/scsi/ufs/ufs-qcom.c:1531 ufs_qcom_testbus_cfg_is_ok() warn: impossible condition '(host->testbus.select_minor > 255) => (0-255 > 255)' drivers/scsi/ufs/ufs-qcom.c 1517 static bool ufs_qcom_testbus_cfg_is_ok(struct ufs_qcom_host *host) 1518 { 1519 if (host->testbus.select_major >= TSTBUS_MAX) { 1520 dev_err(host->hba->dev, 1521 "%s: UFS_CFG1[TEST_BUS_SEL} may not equal 0x%05X\n", 1522 __func__, host->testbus.select_major); 1523 return false; 1524 } 1525 1526 /* 1527 * Not performing check for each individual select_major 1528 * mappings of select_minor, since there is no harm in 1529 * configuring a non-existent select_minor 1530 */ 1531 if (host->testbus.select_minor > 0xFF) { ^ It might make sense to keep this check. I don't know. But it's confusing that 0xFF is a magic number. Better to make it a define. 1532 dev_err(host->hba->dev, 1533 "%s: 0x%05X is not a legal testbus option\n", 1534 __func__, host->testbus.select_minor); 1535 return false; 1536 } 1537 1538 return true; 1539 } regards, dan carpenter
Re: [PATCH RFC] target/user: Add double ring buffers support.
On 2017年02月14日 01:42, Andy Grover wrote: On 02/12/2017 05:38 PM, Xiubo Li wrote: On 2017年02月11日 02:02, Andy Grover wrote: 1. increase the region mmap()ed by userspace, TCMU_RING_SIZE, from 1MB to 1GB or larger 2. Don't vmalloc() the whole thing, instead vmalloc for the cmd ring portion, and dynamically alloc pages for the data area as needed and map them into the data area. Should the data area mapping is dynamical or add one new API to userspace to trigger it ? I would recommend having the kernel make the decision to allocate/map more memory, or not (i.e. wait for existing I/O to complete and free pages) because this avoids the need for a new API, and also the kernel may better decide how to allocate total pages used across multiple TCMU devices. Yes, agreed. 4. Implement an algorithm to keep allocated pages mapped into the data area for reuse, and maybe a heuristic to keep extreme burstiness from over-allocating pages Does a little like slab ? It's definitely becoming less of a storage issue and more of a page-management issue. It might make sense to look at mm code and ask mm experts for their advice, yep. Sure, i will have a deep investigate about this. Thanks very much. BRs Xiubo This should allow TCMU to allocate more data area as needed, not waste memory for slower devices, and avoid userspace ABI changes. Could we prototype this approach and see if it is workable? I will do more investigate about this. Thank you for working to improve this area! Regards -- Andy
Re: [PATCH 1/1] mpt3sas: Ignore unaligned completion length for ZBC_IN
On Tue, 2017-02-14 at 13:42 +0900, Damien Le Moal wrote: > I think so. But my understanding of REQ_TYPE_BLOCK_PC is that it is the > type for requests issued internally (scsi_execute) of from things like > the SG driver, so in essence, all requests not derived from a BIO... Is > this correct ? If yes, then setting the BLOCK_TYPE_PC for > REQ_OP_ZONE_REPORT (and REQ_OP_ZONE_RESET while at it) would break this > model, wouldn't it ? Hello Damien, I think we want to keep the knowledge of which requests have a request size that should be a multiple of the logical block size in the block layer core or in the SCSI core but not in the mpt3sas driver. But I'm not sure what the best approach is to do that. Should we use the request type, should we add a new request attribute or should we add a new function? Bart.
Re: [PATCH 1/1] mpt3sas: Ignore unaligned completion length for ZBC_IN
Bart, On 2/14/17 14:11, Bart Van Assche wrote: > On Tue, 2017-02-14 at 13:42 +0900, Damien Le Moal wrote: >> I think so. But my understanding of REQ_TYPE_BLOCK_PC is that it is the >> type for requests issued internally (scsi_execute) of from things like >> the SG driver, so in essence, all requests not derived from a BIO... Is >> this correct ? If yes, then setting the BLOCK_TYPE_PC for >> REQ_OP_ZONE_REPORT (and REQ_OP_ZONE_RESET while at it) would break this >> model, wouldn't it ? > > Hello Damien, > > I think we want to keep the knowledge of which requests have a request size > that should be a multiple of the logical block size in the block layer core > or in the SCSI core but not in the mpt3sas driver. But I'm not sure what the > best approach is to do that. Should we use the request type, should we add a > new request attribute or should we add a new function? I agree. But the mpt3sas patch that introduced the problem is to solve problems with a buggy hardware in the first place... A quirck of some sort. So do we really need such a big change just the report zones exception ? (all other REQ_TYPE_FS commands either have no payload or the payload size is LBA aligned) Martin, James, Hannes, Please advise ! 4.10 is introducing zoned block device support, and from the first release that will be broken with mpt3sas HBAs if we do not patch this. Any preferred approach ? Best regards. -- Damien Le Moal, Ph.D. Sr. Manager, System Software Research Group, Western Digital Corporation damien.lem...@wdc.com (+81) 0466-98-3593 (ext. 513593) 1 kirihara-cho, Fujisawa, Kanagawa, 252-0888 Japan www.wdc.com, www.hgst.com
Re: [PATCH RFC] target/user: Add double ring buffers support.
Yes I think it's now clear we need more buffer space to avoid bottlenecks for high iops. The initial design kept it simple with the 1MB vmalloc'd space but anticipated greater would be needed. It should not be necessary to change userspace or the TCMU ABI to handle growing the buffer for fast devices: 1. increase the region mmap()ed by userspace, TCMU_RING_SIZE, from 1MB to 1GB or larger For the cmd area, set the size to fixed 512M, and data area's size to fixed 1G, is that okay ? 512M seems a little big for the cmd area... use of the cmd ring should be much smaller than the data area. Each cmd has a minimum size, but then to describe an additional page in the iovec[] should be just one struct iovec (16 bytes?) for each 4K page. The struct tcmu_cmd_entry {} size is fixed 44 bytes without iovec[], and the size of struct iovec[N] is about 16 bytes * N. The cmd entry size will be [44B, N *16 + 44B], and the data size will be [0, N * 4096], so the ratio of sizeof(cmd entry): sizeof(entry datas) == (N * 16 + 44)Bytes : (N * 4096)Bytes == (N * 16)/(N * 4096) + 44/(N*4096) == 1/256 + 11/(N * 1024). When N is bigger, the ratio will be smaller. If N >= 1, the ratio will be [15/1024, 4/1024). So, that's right, 512M is a little bigger than actually needed, for the safe ratio i think 16/1024 is enough, if the data area is fixed 1G, so the cmd area could be 16M+. Or cmd area(64M) + data area(960M) == 1G ? 3. Upgrade the current fixed-size bitmap-based tracking of data area to handle the new scheme The Radix tree will be used to keep the block's index(0 ~ 1G/DATA_BLOCK_SIZE) and physical page mapping relations. Each leaf is one data block(the size is DATA_BLOCK_SIZE). For non-leaf nodes, use the radix tags[0][SLOTs] to indicate wether slot[SLOTs]'s branch has free(reused the old one or NULL leafs) block leafs or not. This could speed the search of the free blocks in data area. Yes I think radix tree is a good place to start. Okay. Thanks, BRs Xiubo Regards -- Andy
Re: [bug report] scsi: aacraid: Added support for hotplug
On Mon, Feb 13, 2017 at 07:39:15PM +, Raghava Aditya Renukunta wrote: > Hi Don, > > > -Original Message- > > From: Dan Carpenter [mailto:dan.carpen...@oracle.com] > > Sent: Monday, February 13, 2017 10:47 AM > > To: Raghava Aditya Renukunta > >> > Cc: linux-scsi@vger.kernel.org > > Subject: [bug report] scsi: aacraid: Added support for hotplug > > > > EXTERNAL EMAIL > > > > > > Hello Raghava Aditya Renukunta, > > > > The patch 6223a39fe6fb: "scsi: aacraid: Added support for hotplug" > > from Feb 2, 2017, leads to the following static checker warning: > > > > drivers/scsi/aacraid/commsup.c:2243 aac_process_events() > > error: double unlock 'spin_lock:t_lock' > > > > drivers/scsi/aacraid/commsup.c > > 2130 spin_lock_irqsave(t_lock, flags); > > > > 2131 > > 2132 while (!list_empty(&(dev->queues- > > >queue[HostNormCmdQueue].cmdq))) { > > 2133 struct list_head *entry; > > 2134 struct aac_aifcmd *aifcmd; > > 2135 unsigned int num; > > 2136 struct hw_fib **hw_fib_pool, **hw_fib_p; > > 2137 struct fib **fib_pool, **fib_p; > > 2138 > > 2139 set_current_state(TASK_RUNNING); > > 2140 > > 2141 entry = dev->queues- > > >queue[HostNormCmdQueue].cmdq.next; > > 2142 list_del(entry); > > 2143 > > 2144 t_lock = dev->queues->queue[HostNormCmdQueue].lock; > > 2145 spin_unlock_irqrestore(t_lock, flags); > > ^ > > 2146 > > 2147 fib = list_entry(entry, struct fib, fiblink); > > 2148 hw_fib = fib->hw_fib_va; > > 2149 if (dev->sa_firmware) { > > 2150 /* Thor AIF */ > > 2151 aac_handle_sa_aif(dev, fib); > > 2152 aac_fib_adapter_complete(fib, > > (u16)sizeof(u32)); > > 2153 continue; > > > > The locking isn't right here. We should re-take the spinlock before > > continuing. > > The intention here is to protect the retrieval of entry from the queues. > Or do you mean that we should just protect the whole while loop with one > spin_lock (t_lock)? > This is a static checker warning that says we call spin_unlock_irqrestore(t_lock, flags); at the end of the loop but sometimes we're not holding the lock. This is a Smatch warning and it doesn't handle loops correctly. It should also warn that on line 2145 we might not be holding the lock either but it misses that bug. There is no way this continue is correct with regards to locking. regards, dan carpenter > Thank you, > Raghava Aditya > > > 2154 } > > 2155 /* > > 2156 * We will process the FIB here or pass it to a > > 2157 * worker thread that is TBD. We Really can't > > 2158 * do anything at this point since we don't > > have > > 2159 * anything defined for this thread to do. > > 2160 */ > > > > [ snip ] > > > > 2221 free_mem: > > /* Free up the remaining resources */ > > 2223 hw_fib_p = hw_fib_pool; > > 2224 fib_p = fib_pool; > > 2225 while (hw_fib_p < _fib_pool[num]) { > > 2226 kfree(*hw_fib_p); > > 2227 kfree(*fib_p); > > 2228 ++fib_p; > > 2229 ++hw_fib_p; > > 2230 } > > 2231 kfree(fib_pool); > > 2232 free_hw_fib_pool: > > 2233 kfree(hw_fib_pool); > > 2234 free_fib: > > 2235 kfree(fib); > > 2236 t_lock = dev->queues->queue[HostNormCmdQueue].lock; > > 2237 spin_lock_irqsave(t_lock, flags); > > 2238 } > > 2239 /* > > 2240 * There are no more AIF's > > 2241 */ > > 2242 t_lock = dev->queues->queue[HostNormCmdQueue].lock; > > 2243 spin_unlock_irqrestore(t_lock, flags); > > > > Otherwise it is a double unlock bug. > > > > 2244 } > > > > > > regards, > > dan carpenter
[PATCH] return valid data buffer length in scsi_bufflen() API using RQF_SPECIAL_PAYLOAD
Regression due to commit f9d03f96b988002027d4b28ea1b7a24729a4c9b5 block: improve handling of the magic discard payload and HBA FW encounter FW fault in DMA operation while creating File system on SSDs. Below CDB cause FW fault. CDB: Write same(16) 93 08 00 00 00 00 00 00 00 00 00 00 80 00 00 00 Root cause is SCSI buffer length and DMA buffer length miss match for WRITE SAME command. Fix - return valid data buffer length in scsi_bufflen() API using RQF_SPECIAL_PAYLOAD Signed-off-by: Kashyap Desai--- diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h index 9fc1aec..1f796fc 100644 --- a/include/scsi/scsi_cmnd.h +++ b/include/scsi/scsi_cmnd.h @@ -180,7 +180,8 @@ static inline struct scatterlist *scsi_sglist(struct scsi_cmnd *cmd) static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd) { - return cmd->sdb.length; + return (cmd->request->rq_flags & RQF_SPECIAL_PAYLOAD) ? + cmd->request->special_vec.bv_len : cmd->sdb.length; } static inline void scsi_set_resid(struct scsi_cmnd *cmd, int resid)
Re: [PATCH v3 10/16] lpfc: NVME Target: Base modifications
On 02/12/2017 10:52 PM, James Smart wrote: > > NVME Target: Base modifications > > This set of patches adds the base modifications for NVME target support > > The base modifications consist of: > - Additional module parameters or configuration tuning > - Enablement of configuration mode for NVME target. Ties into the > queueing model put into place by the initiator basemods patches. > - Target-specific buffer pools, dma pools, sgl pools > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v3 11/16] lpfc: NVME Target: Receive buffer updates
On 02/12/2017 10:52 PM, James Smart wrote: > > NVME Target: Receive buffer updates > > Allocates buffer pools and configures adapter interfaces to handle > receive buffer (asynchronous FCP CMD ius, first burst data) > from the adapter. Splits by protocol, etc. > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart I probably do _not_ want to know why 32G FC-NVMe needs to know about FCFs ... Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v3 03/16] lpfc: minor code cleanups
On 02/12/2017 10:52 PM, James Smart wrote: > > This contains code cleanups that were in the prior patch set. > This allows better review of real changes later. > > minor code cleanups: > fix indentation, punctuation, line length > addition/reduction of whitespace > remove unneeded parens, braces > lpfc_debugfs_nodelist_data: print as u64 rather than byte by byte > covert printk(KERN_ERR to pr_err > small print string deltas > use num_present_cpus() rather than count them > comment updates > rctl/type names moved to module variable, not on stack > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart > --- Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v3 05/16] lpfc: refactor debugfs queue dump routines
On 02/12/2017 10:52 PM, James Smart wrote: > > Create common wq, cq, eq, rq dump functions > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart > --- Looks good, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v3 15/16] lpfc: Update copyrights
On 02/12/2017 10:52 PM, James Smart wrote: > > Update copyrights to 2017 for all files touched in this patch set > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v3 04/16] lpfc: refactor debugfs queue prints
On 02/12/2017 10:52 PM, James Smart wrote: > > Create common wq, cq, eq, rq print functions > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart > --- > drivers/scsi/lpfc/lpfc.h | 1 + > drivers/scsi/lpfc/lpfc_debugfs.c | 564 > --- > 2 files changed, 228 insertions(+), 337 deletions(-) > One should look at converting it to seq_file(), seeing that there are provisions for overflow already. But that can be done in a later patch. Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v3 05/16] lpfc: refactor debugfs queue dump routines
On 02/12/2017 10:52 PM, James Smart wrote: > > Create common wq, cq, eq, rq dump functions > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v3 16/16] lpfc: Update lpfc version to 11.2.0.7
On 02/12/2017 10:52 PM, James Smart wrote: > > Update lpfc version to 11.2.0.7 > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart > --- Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v3 15/16] lpfc: Update copyrights
On 02/12/2017 10:52 PM, James Smart wrote: > > Update copyrights to 2017 for all files touched in this patch set > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart > --- Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v3 04/16] lpfc: refactor debugfs queue prints
On 02/12/2017 10:52 PM, James Smart wrote: > > Create common wq, cq, eq, rq print functions > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart > --- Looks good, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v3 02/16] lpfc: use pci_irq_alloc_vectors and pci_irq_free_vectors
On 02/12/2017 10:52 PM, James Smart wrote: > > From: Christoph Hellwig> > This avoids having to store the msix_entries array and simpliefies the > shutdown and cleanup path a lot. > > Signed-off-by: Christoph Hellwig > Signed-off-by: James Smart > Reviewed-by: Johannes Thumshirn Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v3 03/16] lpfc: minor code cleanups
On 02/12/2017 10:52 PM, James Smart wrote: > > This contains code cleanups that were in the prior patch set. > This allows better review of real changes later. > > minor code cleanups: > fix indentation, punctuation, line length > addition/reduction of whitespace > remove unneeded parens, braces > lpfc_debugfs_nodelist_data: print as u64 rather than byte by byte > covert printk(KERN_ERR to pr_err > small print string deltas > use num_present_cpus() rather than count them > comment updates > rctl/type names moved to module variable, not on stack > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v3 07/16] lpfc: NVME Initiator: Merge into FC discovery
On 02/12/2017 10:52 PM, James Smart wrote: > > NVME Initiator: Merge into FC discovery > > Adds NVME PRLI support and Nameserver registrations and Queries for NVME > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart > --- Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v3 06/16] lpfc: NVME Initiator: Base modifications
On 02/12/2017 10:52 PM, James Smart wrote: > > NVME Initiator: Base modifications > > This patch adds base modifications for NVME initiator support. > > The base modifications consist of: > - Formal split of SLI3 rings from SLI-4 WQs (sometimes referred to as > rings as well) as implementation now widely varies between the two. > - Addition of configuration modes: >SCSI initiator only; NVME initiator only; NVME target only; and >SCSI and NVME initiator. >The configuration mode drives overall adapter configuration, >offloads enabled, and resource splits. >NVME support is only available on SLI-4 devices and newer fw. > - Implements the following based on configuration mode: > - Exchange resources are split by protocol; Obviously, if only > 1 mode, then no split occurs. Default is 50/50. module attribute > allows tuning. > - Pools and config parameters are separated per-protocol > - Each protocol has it's own set of queues, but share interrupt > vectors. > SCSI: >SLI3 devices have few queues and the original style of queue > allocation remains. >SLI4 devices piggy back on an "io-channel" concept that > eventually needs to merge with scsi-mq/blk-mq support (it is >underway). For now, the paradigm continues as it existed >prior. io channel allocates N msix and N WQs (N=4 default) >and either round robins or uses cpu # modulo N for scheduling. >A bunch of module parameters allow the configuration to be >tuned. > NVME (initiator): >Allocates an msix per cpu (or whatever pci_alloc_irq_vectors > gets) >Allocates a WQ per cpu, and maps the WQs to msix on a WQ # > modulo msix vector count basis. >Module parameters exist to cap/control the config if desired. > - Each protocol has its own buffer and dma pools. > > I apologize for the size of the patch. > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart > Apologies accepted :-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v3 08/16] lpfc: NVME Initiator: bind to nvme_fc api
On 02/12/2017 10:52 PM, James Smart wrote: > > NVME Initiator: Tie in to NVME Fabrics nvme_fc LLDD initiator api > > Adds the routines to: > - register and deregister the FC port as a nvme-fc initiator localport > - register and deregister remote FC ports as a nvme-fc remoteport > - binding of nvme queues to adapter WQs > - send/perform NVME LS's > - send/perform NVME FCP initiator io operations > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v3 09/16] lpfc: NVME Initiator: Add debugfs support
On 02/12/2017 10:52 PM, James Smart wrote: > > NVME Initiator: Add debugfs support > > Adds debugfs snippets to cover the new NVME initiator functionality > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v3 13/16] lpfc: NVME Target: bind to nvmet_fc api
On 02/12/2017 10:52 PM, James Smart wrote: > > NVME Target: Tie in to NVME Fabrics nvmet_fc LLDD target api > > Adds the routines to: > - register and deregister the FC port as a nvmet-fc targetport > - binding of nvme queues to adapter WQs > - receipt and passing of NVME LS's to transport, sending transport response > - receipt of NVME FCP CMD IUs, processing FCP target io data transmission > commands; transmission of FCP io response > - Abort operations for tgt io exchanges > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v3 14/16] lpfc: NVME Target: Add debugfs support
On 02/12/2017 10:52 PM, James Smart wrote: > > NVME Target: Add debugfs support > > Adds debugfs snippets to cover the new NVME target functionality > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v3 12/16] lpfc: NVME Target: Merge into FC discovery
On 02/12/2017 10:52 PM, James Smart wrote: > > NVME Target: Merge into FC discovery > > Adds NVME PRLI handling and Nameserver registrations for NVME > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v3 16/16] lpfc: Update lpfc version to 11.2.0.7
On 02/12/2017 10:52 PM, James Smart wrote: > > Update lpfc version to 11.2.0.7 > > Signed-off-by: Dick Kennedy> Signed-off-by: James Smart > --- > drivers/scsi/lpfc/lpfc_version.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/scsi/lpfc/lpfc_version.h > b/drivers/scsi/lpfc/lpfc_version.h > index 0d93535..86c6c9b 100644 > --- a/drivers/scsi/lpfc/lpfc_version.h > +++ b/drivers/scsi/lpfc/lpfc_version.h > @@ -20,7 +20,7 @@ > * included with this package. * > ***/ > > -#define LPFC_DRIVER_VERSION "11.2.0.4" > +#define LPFC_DRIVER_VERSION "11.2.0.7" > #define LPFC_DRIVER_NAME "lpfc" > > /* Used for SLI 2/3 */ > Hmm. One would have thought that a major rework like this would warrant an increased major number. But that's probably just me being totally ignorant of any company policies :-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH RFC] target/user: Add double ring buffers support.
On 02/13/2017 01:50 AM, Xiubo Li wrote: There is one new scheme in my mind: Yes I think it's now clear we need more buffer space to avoid bottlenecks for high iops. The initial design kept it simple with the 1MB vmalloc'd space but anticipated greater would be needed. It should not be necessary to change userspace or the TCMU ABI to handle growing the buffer for fast devices: 1. increase the region mmap()ed by userspace, TCMU_RING_SIZE, from 1MB to 1GB or larger For the cmd area, set the size to fixed 512M, and data area's size to fixed 1G, is that okay ? 512M seems a little big for the cmd area... use of the cmd ring should be much smaller than the data area. Each cmd has a minimum size, but then to describe an additional page in the iovec[] should be just one struct iovec (16 bytes?) for each 4K page. 3. Upgrade the current fixed-size bitmap-based tracking of data area to handle the new scheme The Radix tree will be used to keep the block's index(0 ~ 1G/DATA_BLOCK_SIZE) and physical page mapping relations. Each leaf is one data block(the size is DATA_BLOCK_SIZE). For non-leaf nodes, use the radix tags[0][SLOTs] to indicate wether slot[SLOTs]'s branch has free(reused the old one or NULL leafs) block leafs or not. This could speed the search of the free blocks in data area. Yes I think radix tree is a good place to start. Regards -- Andy
Re: [PATCH RFC] target/user: Add double ring buffers support.
On 02/12/2017 05:38 PM, Xiubo Li wrote: On 2017年02月11日 02:02, Andy Grover wrote: 1. increase the region mmap()ed by userspace, TCMU_RING_SIZE, from 1MB to 1GB or larger 2. Don't vmalloc() the whole thing, instead vmalloc for the cmd ring portion, and dynamically alloc pages for the data area as needed and map them into the data area. Should the data area mapping is dynamical or add one new API to userspace to trigger it ? I would recommend having the kernel make the decision to allocate/map more memory, or not (i.e. wait for existing I/O to complete and free pages) because this avoids the need for a new API, and also the kernel may better decide how to allocate total pages used across multiple TCMU devices. 4. Implement an algorithm to keep allocated pages mapped into the data area for reuse, and maybe a heuristic to keep extreme burstiness from over-allocating pages Does a little like slab ? It's definitely becoming less of a storage issue and more of a page-management issue. It might make sense to look at mm code and ask mm experts for their advice, yep. This should allow TCMU to allocate more data area as needed, not waste memory for slower devices, and avoid userspace ABI changes. Could we prototype this approach and see if it is workable? I will do more investigate about this. Thank you for working to improve this area! Regards -- Andy
Re: [PATCH RFC] target/user: Add double ring buffers support.
Hi Andy, There is one new scheme in my mind: Yes I think it's now clear we need more buffer space to avoid bottlenecks for high iops. The initial design kept it simple with the 1MB vmalloc'd space but anticipated greater would be needed. It should not be necessary to change userspace or the TCMU ABI to handle growing the buffer for fast devices: 1. increase the region mmap()ed by userspace, TCMU_RING_SIZE, from 1MB to 1GB or larger For the cmd area, set the size to fixed 512M, and data area's size to fixed 1G, is that okay ? 2. Don't vmalloc() the whole thing, instead vmalloc for the cmd ring portion, and dynamically alloc pages for the data area as needed and map them into the data area. TCMU will just vmalloc() the 512M cmd area, and let the data area memory allocate and map later when needed. The userspace runner will mmap() all the (512M + 1G) when initialising, and will return 1.5G virtual address space. The cmd area will be mapped to actual physical addresses here, while the data area will be mapped in page fault hook when using 3. Upgrade the current fixed-size bitmap-based tracking of data area to handle the new scheme The Radix tree will be used to keep the block's index(0 ~ 1G/DATA_BLOCK_SIZE) and physical page mapping relations. Each leaf is one data block(the size is DATA_BLOCK_SIZE). For non-leaf nodes, use the radix tags[0][SLOTs] to indicate wether slot[SLOTs]'s branch has free(reused the old one or NULL leafs) block leafs or not. This could speed the search of the free blocks in data area. 4. Implement an algorithm to keep allocated pages mapped into the data area for reuse, and maybe a heuristic to keep extreme burstiness from over-allocating pages For leaf nodes: if one leaf node is exist and its tags[0][SLOTs] = 0 meaning that some older cmds have already touched the block leafs and then "freed" it, if so, we will reuse it. if the leaf node is exist and its tag[0][SLOTs] = 1 meaning that this is still used. if the leaf node is non-exist, this is the first time to touch this leaf, will allocate memory and then insert it here setting the tag[0][SLOTs] to 1. This should allow TCMU to allocate more data area as needed, not waste memory for slower devices, and avoid userspace ABI changes. Could we prototype this approach and see if it is workable? For slower devices, it will save memory. Could also avoid changing the userspace ABI. Thanks, BRs Xiubo Thanks -- Regards -- Andy
RE: [bug report] scsi: aacraid: Added support for hotplug
Hi Don, > -Original Message- > From: Dan Carpenter [mailto:dan.carpen...@oracle.com] > Sent: Monday, February 13, 2017 10:47 AM > To: Raghava Aditya Renukunta >> Cc: linux-scsi@vger.kernel.org > Subject: [bug report] scsi: aacraid: Added support for hotplug > > EXTERNAL EMAIL > > > Hello Raghava Aditya Renukunta, > > The patch 6223a39fe6fb: "scsi: aacraid: Added support for hotplug" > from Feb 2, 2017, leads to the following static checker warning: > > drivers/scsi/aacraid/commsup.c:2243 aac_process_events() > error: double unlock 'spin_lock:t_lock' > > drivers/scsi/aacraid/commsup.c > 2130 spin_lock_irqsave(t_lock, flags); > > 2131 > 2132 while (!list_empty(&(dev->queues- > >queue[HostNormCmdQueue].cmdq))) { > 2133 struct list_head *entry; > 2134 struct aac_aifcmd *aifcmd; > 2135 unsigned int num; > 2136 struct hw_fib **hw_fib_pool, **hw_fib_p; > 2137 struct fib **fib_pool, **fib_p; > 2138 > 2139 set_current_state(TASK_RUNNING); > 2140 > 2141 entry = dev->queues- > >queue[HostNormCmdQueue].cmdq.next; > 2142 list_del(entry); > 2143 > 2144 t_lock = dev->queues->queue[HostNormCmdQueue].lock; > 2145 spin_unlock_irqrestore(t_lock, flags); > ^ > 2146 > 2147 fib = list_entry(entry, struct fib, fiblink); > 2148 hw_fib = fib->hw_fib_va; > 2149 if (dev->sa_firmware) { > 2150 /* Thor AIF */ > 2151 aac_handle_sa_aif(dev, fib); > 2152 aac_fib_adapter_complete(fib, > (u16)sizeof(u32)); > 2153 continue; > > The locking isn't right here. We should re-take the spinlock before > continuing. The intention here is to protect the retrieval of entry from the queues. Or do you mean that we should just protect the whole while loop with one spin_lock (t_lock)? Thank you, Raghava Aditya > 2154 } > 2155 /* > 2156 * We will process the FIB here or pass it to a > 2157 * worker thread that is TBD. We Really can't > 2158 * do anything at this point since we don't have > 2159 * anything defined for this thread to do. > 2160 */ > > [ snip ] > > 2221 free_mem: > /* Free up the remaining resources */ > 2223 hw_fib_p = hw_fib_pool; > 2224 fib_p = fib_pool; > 2225 while (hw_fib_p < _fib_pool[num]) { > 2226 kfree(*hw_fib_p); > 2227 kfree(*fib_p); > 2228 ++fib_p; > 2229 ++hw_fib_p; > 2230 } > 2231 kfree(fib_pool); > 2232 free_hw_fib_pool: > 2233 kfree(hw_fib_pool); > 2234 free_fib: > 2235 kfree(fib); > 2236 t_lock = dev->queues->queue[HostNormCmdQueue].lock; > 2237 spin_lock_irqsave(t_lock, flags); > 2238 } > 2239 /* > 2240 * There are no more AIF's > 2241 */ > 2242 t_lock = dev->queues->queue[HostNormCmdQueue].lock; > 2243 spin_unlock_irqrestore(t_lock, flags); > > Otherwise it is a double unlock bug. > > 2244 } > > > regards, > dan carpenter
[bug report] scsi: aacraid: Include HBA direct interface
Hello Raghava Aditya Renukunta, The patch 423400e64d37: "scsi: aacraid: Include HBA direct interface" from Feb 2, 2017, leads to the following static checker warning: drivers/scsi/aacraid/commsup.c:762 aac_hba_send() error: double unlock 'sem:>event_wait' drivers/scsi/aacraid/commsup.c 757 if (wait) { 758 spin_unlock_irqrestore(>event_lock, flags); 759 /* Only set for first known interruptable command */ This comment just confuses me... What is a "known interruptable command?" 760 if (down_interruptible(>event_wait)) { down_interruptible() return -EINTR on failure so this test looks reversed. It should probably be: if (!down_interruptible(>event_wait)) { 761 fibptr->done = 2; 762 up(>event_wait); 763 } 764 spin_lock_irqsave(>event_lock, flags); 765 if ((fibptr->done == 0) || (fibptr->done == 2)) { 766 fibptr->done = 2; /* Tell interrupt we aborted */ 767 spin_unlock_irqrestore(>event_lock, flags); 768 return -ERESTARTSYS; 769 } 770 spin_unlock_irqrestore(>event_lock, flags); 771 WARN_ON(fibptr->done == 0); 772 773 if (unlikely(fibptr->flags & FIB_CONTEXT_FLAG_TIMED_OUT)) 774 return -ETIMEDOUT; 775 776 return 0; 777 } regards, dan carpenter
Re: [PATCH 1/1] mpt3sas: Ignore unaligned completion length for ZBC_IN
On Mon, 2017-02-13 at 14:11 +0900, Damien Le Moal wrote: > The ZBC_IN command (REPORT ZONES) reply length is always a multiple of > 64B and thus may not be aligned on the device LBA size. > For this command, retry due to the unaligned completion length is > incorrect so do not check alignment of the reply length. > > Signed-off-by: Damien Le Moal> --- > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > index 0b5b423..f04b872 100644 > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > @@ -4723,8 +4723,9 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, > u8 msix_index, u32 reply) >* then scsi-ml does not need to handle this misbehavior. >*/ > sector_sz = scmd->device->sector_size; > - if (unlikely(scmd->request->cmd_type == REQ_TYPE_FS && sector_sz && > - xfer_cnt % sector_sz)) { > + if (scmd->request->cmd_type == REQ_TYPE_FS && > + scmd->cmnd[0] != ZBC_IN && > + sector_sz && xfer_cnt % sector_sz) { > sdev_printk(KERN_INFO, scmd->device, > "unaligned partial completion avoided (xfer_cnt=%u, > sector_sz=%u)\n", > xfer_cnt, sector_sz); Hello Damien, What software generated a ZBC_IN request with type REQ_TYPE_FS? Shouldn't all ZBC_IN requests be assigned type REQ_TYPE_BLOCK_PC? Thanks, Bart.
[bug report] scsi: aacraid: Added support for hotplug
Hello Raghava Aditya Renukunta, The patch 6223a39fe6fb: "scsi: aacraid: Added support for hotplug" from Feb 2, 2017, leads to the following static checker warning: drivers/scsi/aacraid/commsup.c:2243 aac_process_events() error: double unlock 'spin_lock:t_lock' drivers/scsi/aacraid/commsup.c 2130 spin_lock_irqsave(t_lock, flags); 2131 2132 while (!list_empty(&(dev->queues->queue[HostNormCmdQueue].cmdq))) { 2133 struct list_head *entry; 2134 struct aac_aifcmd *aifcmd; 2135 unsigned int num; 2136 struct hw_fib **hw_fib_pool, **hw_fib_p; 2137 struct fib **fib_pool, **fib_p; 2138 2139 set_current_state(TASK_RUNNING); 2140 2141 entry = dev->queues->queue[HostNormCmdQueue].cmdq.next; 2142 list_del(entry); 2143 2144 t_lock = dev->queues->queue[HostNormCmdQueue].lock; 2145 spin_unlock_irqrestore(t_lock, flags); ^ 2146 2147 fib = list_entry(entry, struct fib, fiblink); 2148 hw_fib = fib->hw_fib_va; 2149 if (dev->sa_firmware) { 2150 /* Thor AIF */ 2151 aac_handle_sa_aif(dev, fib); 2152 aac_fib_adapter_complete(fib, (u16)sizeof(u32)); 2153 continue; The locking isn't right here. We should re-take the spinlock before continuing. 2154 } 2155 /* 2156 * We will process the FIB here or pass it to a 2157 * worker thread that is TBD. We Really can't 2158 * do anything at this point since we don't have 2159 * anything defined for this thread to do. 2160 */ [ snip ] 2221 free_mem: /* Free up the remaining resources */ 2223 hw_fib_p = hw_fib_pool; 2224 fib_p = fib_pool; 2225 while (hw_fib_p < _fib_pool[num]) { 2226 kfree(*hw_fib_p); 2227 kfree(*fib_p); 2228 ++fib_p; 2229 ++hw_fib_p; 2230 } 2231 kfree(fib_pool); 2232 free_hw_fib_pool: 2233 kfree(hw_fib_pool); 2234 free_fib: 2235 kfree(fib); 2236 t_lock = dev->queues->queue[HostNormCmdQueue].lock; 2237 spin_lock_irqsave(t_lock, flags); 2238 } 2239 /* 2240 * There are no more AIF's 2241 */ 2242 t_lock = dev->queues->queue[HostNormCmdQueue].lock; 2243 spin_unlock_irqrestore(t_lock, flags); Otherwise it is a double unlock bug. 2244 } regards, dan carpenter
[PATCH V4 net-next 1/2] qed: Add support for hardware offloaded FCoE.
From: Arun EasiThis adds the backbone required for the various HW initalizations which are necessary for the FCoE driver (qedf) for QLogic FastLinQ 4 line of adapters - FW notification, resource initializations, etc. Signed-off-by: Arun Easi Signed-off-by: Yuval Mintz --- drivers/net/ethernet/qlogic/Kconfig |3 + drivers/net/ethernet/qlogic/qed/Makefile |1 + drivers/net/ethernet/qlogic/qed/qed.h | 11 + drivers/net/ethernet/qlogic/qed/qed_cxt.c | 98 +- drivers/net/ethernet/qlogic/qed/qed_cxt.h |3 + drivers/net/ethernet/qlogic/qed/qed_dcbx.c| 13 +- drivers/net/ethernet/qlogic/qed/qed_dcbx.h|5 +- drivers/net/ethernet/qlogic/qed/qed_dev.c | 205 - drivers/net/ethernet/qlogic/qed/qed_dev_api.h | 42 + drivers/net/ethernet/qlogic/qed/qed_fcoe.c| 1014 + drivers/net/ethernet/qlogic/qed/qed_fcoe.h| 76 ++ drivers/net/ethernet/qlogic/qed/qed_hsi.h | 781 +++- drivers/net/ethernet/qlogic/qed/qed_hw.c |3 + drivers/net/ethernet/qlogic/qed/qed_ll2.c | 25 + drivers/net/ethernet/qlogic/qed/qed_ll2.h |2 +- drivers/net/ethernet/qlogic/qed/qed_main.c|7 + drivers/net/ethernet/qlogic/qed/qed_mcp.c |3 + drivers/net/ethernet/qlogic/qed/qed_mcp.h |1 + drivers/net/ethernet/qlogic/qed/qed_reg_addr.h|8 + drivers/net/ethernet/qlogic/qed/qed_sp.h |4 + drivers/net/ethernet/qlogic/qed/qed_sp_commands.c |3 + include/linux/qed/common_hsi.h| 10 +- include/linux/qed/fcoe_common.h | 715 +++ include/linux/qed/qed_fcoe_if.h | 145 +++ include/linux/qed/qed_if.h| 41 +- 25 files changed, 3200 insertions(+), 19 deletions(-) create mode 100644 drivers/net/ethernet/qlogic/qed/qed_fcoe.c create mode 100644 drivers/net/ethernet/qlogic/qed/qed_fcoe.h create mode 100644 include/linux/qed/fcoe_common.h create mode 100644 include/linux/qed/qed_fcoe_if.h diff --git a/drivers/net/ethernet/qlogic/Kconfig b/drivers/net/ethernet/qlogic/Kconfig index 3cfd105..737b303 100644 --- a/drivers/net/ethernet/qlogic/Kconfig +++ b/drivers/net/ethernet/qlogic/Kconfig @@ -113,4 +113,7 @@ config QED_RDMA config QED_ISCSI bool +config QED_FCOE + bool + endif # NET_VENDOR_QLOGIC diff --git a/drivers/net/ethernet/qlogic/qed/Makefile b/drivers/net/ethernet/qlogic/qed/Makefile index 729e437..e234083 100644 --- a/drivers/net/ethernet/qlogic/qed/Makefile +++ b/drivers/net/ethernet/qlogic/qed/Makefile @@ -7,3 +7,4 @@ qed-$(CONFIG_QED_SRIOV) += qed_sriov.o qed_vf.o qed-$(CONFIG_QED_LL2) += qed_ll2.o qed-$(CONFIG_QED_RDMA) += qed_roce.o qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o qed_ooo.o +qed-$(CONFIG_QED_FCOE) += qed_fcoe.o diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index 1f61cf3..08f2885 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -60,6 +60,7 @@ #define QED_WFQ_UNIT 100 #define ISCSI_BDQ_ID(_port_id) (_port_id) +#define FCOE_BDQ_ID(_port_id) ((_port_id) + 2) #define QED_WID_SIZE(1024) #define QED_PF_DEMS_SIZE(4) @@ -167,6 +168,7 @@ struct qed_tunn_update_params { */ enum qed_pci_personality { QED_PCI_ETH, + QED_PCI_FCOE, QED_PCI_ISCSI, QED_PCI_ETH_ROCE, QED_PCI_DEFAULT /* default in shmem */ @@ -204,6 +206,7 @@ enum QED_FEATURE { QED_VF, QED_RDMA_CNQ, QED_VF_L2_QUE, + QED_FCOE_CQ, QED_MAX_FEATURES, }; @@ -221,6 +224,7 @@ enum QED_PORT_MODE { enum qed_dev_cap { QED_DEV_CAP_ETH, + QED_DEV_CAP_FCOE, QED_DEV_CAP_ISCSI, QED_DEV_CAP_ROCE, }; @@ -255,6 +259,10 @@ struct qed_hw_info { u32 part_num[4]; unsigned char hw_mac_addr[ETH_ALEN]; + u64 node_wwn; + u64 port_wwn; + + u16 num_fcoe_conns; struct qed_igu_info *p_igu_info; @@ -410,6 +418,7 @@ struct qed_hwfn { struct qed_ooo_info *p_ooo_info; struct qed_rdma_info*p_rdma_info; struct qed_iscsi_info *p_iscsi_info; + struct qed_fcoe_info*p_fcoe_info; struct qed_pf_paramspf_params; bool b_rdma_enabled_in_prs; @@ -618,11 +627,13 @@ struct qed_dev { u8 protocol; #define IS_QED_ETH_IF(cdev) ((cdev)->protocol == QED_PROTOCOL_ETH) +#define IS_QED_FCOE_IF(cdev)((cdev)->protocol == QED_PROTOCOL_FCOE) /* Callbacks to protocol driver */ union { struct qed_common_cb_ops*common;
[PATCH V4 0/2] Add QLogic FastLinQ FCoE (qedf) driver
From: "Dupuis, Chad"Dave, please apply the qed patch to net-next at your earliest convenience. Martin, the qed patch needs to be applied first as the qedf patch is dependent on the FCoE bits in the first qed driver patch. This series introduces the hardware offload FCoE initiator driver for the 41000 Series Converged Network Adapters (579xx chip) by Cavium. The overall driver design includes a common module ('qed') and protocol specific dependent modules ('qedf' for FCoE). This driver uses the kernel components of libfc and libfcoe as is and does not make use of the open-fcoe user space components. Therefore, no changes will need to be made to any open-fcoe components. The 'qed' common module, under drivers/net/ethernet/qlogic/qed/, is enhanced with functionality required for FCoE support. Changes from V3 -> V4 - Minor update to banner text in qed_fcoe.c|h files - Fix kbuild robot error on 32-bit systems in qedf - Remove unneeded double memcpy for offloaded ELS commands in qedf Changes from V2 -> V3 - Fix uninitialized variables reported by kbuild robot in qedf - Remove superfluous comments from qedf.h - Introduce new qedf_ctx flag to different stopping I/O for debug purposes. - Don't take lport->disc.disc_mutex when restarting an rport. - Remove extra whitespace in qedf_hsi.h Changes from V1 -> V2 Changes in qed: - Fix compiler warning when CONFIG_DCB is not set. Fixes in qedf: - Add qedf to scsi directory Makefile. - Updates to convert LightL2 and I/O processing kthreads to workqueues. Changes from RFC -> V1 - Squash qedf patches to one patch now that the initial review has taken place - Convert qedf to use hotplug state machine - Return via va_end to match corresponding va_start in logging functions - Convert qedf_ctx offloaded port list to a RCU list so searches do not need to make use of spinlocks. Also eliminates the need to fcport conn_id's. - Use IS_ERR(fp) in qedf_flogi_resp() instead of checking individual FC_EX_* errors. - Remove scsi_block_target when executing TMF request. - Checkpatch fixes in the qed and qedf patches Arun Easi (1): qed: Add support for hardware offloaded FCoE. Dupuis, Chad (1): qedf: Add QLogic FastLinQ offload FCoE driver framework. MAINTAINERS |6 + drivers/net/ethernet/qlogic/Kconfig |3 + drivers/net/ethernet/qlogic/qed/Makefile |1 + drivers/net/ethernet/qlogic/qed/qed.h | 11 + drivers/net/ethernet/qlogic/qed/qed_cxt.c | 98 +- drivers/net/ethernet/qlogic/qed/qed_cxt.h |3 + drivers/net/ethernet/qlogic/qed/qed_dcbx.c| 13 +- drivers/net/ethernet/qlogic/qed/qed_dcbx.h|5 +- drivers/net/ethernet/qlogic/qed/qed_dev.c | 205 +- drivers/net/ethernet/qlogic/qed/qed_dev_api.h | 42 + drivers/net/ethernet/qlogic/qed/qed_fcoe.c| 1014 +++ drivers/net/ethernet/qlogic/qed/qed_fcoe.h| 76 + drivers/net/ethernet/qlogic/qed/qed_hsi.h | 781 - drivers/net/ethernet/qlogic/qed/qed_hw.c |3 + drivers/net/ethernet/qlogic/qed/qed_ll2.c | 25 + drivers/net/ethernet/qlogic/qed/qed_ll2.h |2 +- drivers/net/ethernet/qlogic/qed/qed_main.c|7 + drivers/net/ethernet/qlogic/qed/qed_mcp.c |3 + drivers/net/ethernet/qlogic/qed/qed_mcp.h |1 + drivers/net/ethernet/qlogic/qed/qed_reg_addr.h|8 + drivers/net/ethernet/qlogic/qed/qed_sp.h |4 + drivers/net/ethernet/qlogic/qed/qed_sp_commands.c |3 + drivers/scsi/Kconfig |1 + drivers/scsi/Makefile |1 + drivers/scsi/qedf/Kconfig | 11 + drivers/scsi/qedf/Makefile|5 + drivers/scsi/qedf/qedf.h | 545 drivers/scsi/qedf/qedf_attr.c | 165 + drivers/scsi/qedf/qedf_dbg.c | 195 ++ drivers/scsi/qedf/qedf_dbg.h | 154 + drivers/scsi/qedf/qedf_debugfs.c | 460 +++ drivers/scsi/qedf/qedf_els.c | 949 ++ drivers/scsi/qedf/qedf_fip.c | 269 ++ drivers/scsi/qedf/qedf_hsi.h | 422 +++ drivers/scsi/qedf/qedf_io.c | 2282 ++ drivers/scsi/qedf/qedf_main.c | 3336 + drivers/scsi/qedf/qedf_version.h | 15 + include/linux/qed/common_hsi.h| 10 +- include/linux/qed/fcoe_common.h | 715 + include/linux/qed/qed_fcoe_if.h | 145 + include/linux/qed/qed_if.h| 41 +- 41 files changed, 12016 insertions(+), 19 deletions(-) create mode 100644 drivers/net/ethernet/qlogic/qed/qed_fcoe.c create mode 100644 drivers/net/ethernet/qlogic/qed/qed_fcoe.h create mode 100644
Re: [PATCH v3 00/39] megaraid_sas: Updates for scsi-next
> "Shivasharan" == Shivasharan S> writes: Shivasharan> Changes in v3: Patch 21: Fix to pass rctx_g35 pointer to Shivasharan> set/get_num_sge() Move all v2 changelog descriptions beyond Shivasharan> actual commit message body Applied to 4.11/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 00/10] mpt3sas: full mq support
On 02/13/2017 07:15 AM, Sreekanth Reddy wrote: > On Fri, Feb 10, 2017 at 12:29 PM, Hannes Reineckewrote: >> On 02/10/2017 05:43 AM, Sreekanth Reddy wrote: >>> On Thu, Feb 9, 2017 at 6:42 PM, Hannes Reinecke wrote: On 02/09/2017 02:03 PM, Sreekanth Reddy wrote: >> [ .. ] > > > Hannes, > > I have created a md raid0 with 4 SAS SSD drives using below command, > #mdadm --create /dev/md0 --level=0 --raid-devices=4 /dev/sdg /dev/sdh > /dev/sdi /dev/sdj > > And here is 'mdadm --detail /dev/md0' command output, > -- > /dev/md0: > Version : 1.2 > Creation Time : Thu Feb 9 14:38:47 2017 > Raid Level : raid0 > Array Size : 780918784 (744.74 GiB 799.66 GB) >Raid Devices : 4 > Total Devices : 4 > Persistence : Superblock is persistent > > Update Time : Thu Feb 9 14:38:47 2017 > State : clean > Active Devices : 4 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 0 > > Chunk Size : 512K > >Name : host_name >UUID : b63f9da7:b7de9a25:6a46ca00:42214e22 > Events : 0 > > Number Major Minor RaidDevice State >0 8 960 active sync /dev/sdg >1 8 1121 active sync /dev/sdh >2 8 1442 active sync /dev/sdj >3 8 1283 active sync /dev/sdi > -- > > Then I have used below fio profile to run 4K sequence read operations > with nr_hw_queues=1 driver and with nr_hw_queues=24 driver (as my > system has two numa node and each with 12 cpus). > - > global] > ioengine=libaio > group_reporting > direct=1 > rw=read > bs=4k > allow_mounted_write=0 > iodepth=128 > runtime=150s > > [job1] > filename=/dev/md0 > - > > Here are the fio results when nr_hw_queues=1 (i.e. single request > queue) with various number of job counts > 1JOB 4k read : io=213268MB, bw=1421.8MB/s, iops=363975, runt=150001msec > 2JOBs 4k read : io=309605MB, bw=2064.2MB/s, iops=528389, runt=150001msec > 4JOBs 4k read : io=281001MB, bw=1873.4MB/s, iops=479569, runt=150002msec > 8JOBs 4k read : io=236297MB, bw=1575.2MB/s, iops=403236, runt=150016msec > > Here are the fio results when nr_hw_queues=24 (i.e. multiple request > queue) with various number of job counts > 1JOB 4k read : io=95194MB, bw=649852KB/s, iops=162463, runt=150001msec > 2JOBs 4k read : io=189343MB, bw=1262.3MB/s, iops=323142, runt=150001msec > 4JOBs 4k read : io=314832MB, bw=2098.9MB/s, iops=537309, runt=150001msec > 8JOBs 4k read : io=277015MB, bw=1846.8MB/s, iops=472769, runt=150001msec > > Here we can see that on less number of jobs count, single request > queue (nr_hw_queues=1) is giving more IOPs than multi request > queues(nr_hw_queues=24). > > Can you please share your fio profile, so that I can try same thing on > my system. > Have you tried with the latest git update from Jens for-4.11/block (or for-4.11/next) branch? >>> >>> I am using below git repo, >>> >>> https://git.kernel.org/cgit/linux/kernel/git/mkp/scsi.git/log/?h=4.11/scsi-queue >>> >>> Today I will try with Jens for-4.11/block. >>> >> By all means, do. >> I've found that using the mq-deadline scheduler has a noticeable performance boost. The fio job I'm using is essentially the same; you just should make sure to specify a 'numjob=' statement in there. Otherwise fio will just use a single CPU, which of course leads to averse effects in the multiqueue case. >>> >>> Yes I am providing 'numjob=' on fio command line as shown below, >>> >>> # fio md_fio_profile --numjobs=8 --output=fio_results.txt >>> >> Still, it looks as if you'd be using less jobs than you have CPUs. >> Which means you'll be running into a tag starvation scenario on those >> CPUs, especially for the small blocksizes. >> What are the results if you set 'numjobs' to the number of CPUs? >> > > Hannes, > > Tried on Jens for-4.11/block kernel repo and also set each block PD's > scheduler as 'mq-deadline', and here is my results for 4K SR on md0 > (raid0 with 4 drives). I have 24 CPUs and so tried even with setting > numjobs=24. > > fio results when nr_hw_queues=1 (i.e. single request queue) with > various number of job counts > > 4k read when numjobs=1 : io=215553MB, bw=1437.9MB/s, iops=367874, > runt=150001msec >
Re: [PATCH] return valid data buffer length in scsi_bufflen() API using RQF_SPECIAL_PAYLOAD
Hi Kashyap, can you try 4.10-rc8? It has "scsi: use blk_rq_payload_bytes" which initializeѕ sdb->length to blk_rq_payload_bytes(), which should fix your issue.