[PATCH v2] scsi: ufshcd: fix possible unclocked register access

2016-10-05 Thread Subhash Jadavani
Vendor specific setup_clocks callback may require the clocks managed
by ufshcd driver to be ON. So if the vendor specific setup_clocks callback
is called while the required clocks are turned off, it could result into
unclocked register access.

To prevent possible unclock register access, this change makes sure that
required clocks remain enabled before calling into vendor specific
setup_clocks callback.

Signed-off-by: Subhash Jadavani 
---
Changes from v2:
* Don't call ufshcd_vops_setup_clocks() again for clock off
---
 drivers/scsi/ufs/ufshcd.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 05c7456..c1a77d3 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -5389,6 +5389,17 @@ static int __ufshcd_setup_clocks(struct ufs_hba *hba, 
bool on,
if (!head || list_empty(head))
goto out;
 
+   /*
+* vendor specific setup_clocks ops may depend on clocks managed by
+* this standard driver hence call the vendor specific setup_clocks
+* before disabling the clocks managed here.
+*/
+   if (!on) {
+   ret = ufshcd_vops_setup_clocks(hba, on);
+   if (ret)
+   return ret;
+   }
+
list_for_each_entry(clki, head, list) {
if (!IS_ERR_OR_NULL(clki->clk)) {
if (skip_ref_clk && !strcmp(clki->name, "ref_clk"))
@@ -5410,7 +5421,16 @@ static int __ufshcd_setup_clocks(struct ufs_hba *hba, 
bool on,
}
}
 
-   ret = ufshcd_vops_setup_clocks(hba, on);
+   /*
+* vendor specific setup_clocks ops may depend on clocks managed by
+* this standard driver hence call the vendor specific setup_clocks
+* after enabling the clocks managed here.
+*/
+   if (on) {
+   ret = ufshcd_vops_setup_clocks(hba, on);
+   if (ret)
+   return ret;
+   }
 out:
if (ret) {
list_for_each_entry(clki, head, list) {
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 1/7] block: Add 'zoned' queue limit

2016-10-05 Thread Damien Le Moal
Add the zoned queue limit to indicate the zoning model of a block device.
Defined values are 0 (BLK_ZONED_NONE) for regular block devices,
1 (BLK_ZONED_HA) for host-aware zone block devices and 2 (BLK_ZONED_HM)
for host-managed zone block devices. The standards defined drive managed
model is not defined here since these block devices do not provide any
command for accessing zone information. Drive managed model devices will
be reported as BLK_ZONED_NONE.

The helper functions blk_queue_zoned_model and bdev_zoned_model return
the zoned limit and the functions blk_queue_is_zoned and bdev_is_zoned
return a boolean for callers to test if a block device is zoned.

The zoned attribute is also exported as a string to applications via
sysfs. BLK_ZONED_NONE shows as "none", BLK_ZONED_HA as "host-aware" and
BLK_ZONED_HM as "host-managed".

Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Shaun Tancheff 
Tested-by: Shaun Tancheff 
---
 Documentation/ABI/testing/sysfs-block | 16 
 block/blk-settings.c  |  1 +
 block/blk-sysfs.c | 18 ++
 include/linux/blkdev.h| 47 +++
 4 files changed, 82 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block 
b/Documentation/ABI/testing/sysfs-block
index 71d184d..75a5055 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -235,3 +235,19 @@ Description:
write_same_max_bytes is 0, write same is not supported
by the device.
 
+What:  /sys/block//queue/zoned
+Date:  September 2016
+Contact:   Damien Le Moal 
+Description:
+   zoned indicates if the device is a zoned block device
+   and the zone model of the device if it is indeed zoned.
+   The possible values indicated by zoned are "none" for
+   regular block devices and "host-aware" or "host-managed"
+   for zoned block devices. The characteristics of
+   host-aware and host-managed zoned block devices are
+   described in the ZBC (Zoned Block Commands) and ZAC
+   (Zoned Device ATA Command Set) standards. These standards
+   also define the "drive-managed" zone model. However,
+   since drive-managed zoned block devices do not support
+   zone commands, they will be treated as regular block
+   devices and zoned will report "none".
diff --git a/block/blk-settings.c b/block/blk-settings.c
index f679ae1..b1d5b7f 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -107,6 +107,7 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->io_opt = 0;
lim->misaligned = 0;
lim->cluster = 1;
+   lim->zoned = BLK_ZONED_NONE;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9cc8d7c..ff9cd9c 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -257,6 +257,18 @@ QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
+static ssize_t queue_zoned_show(struct request_queue *q, char *page)
+{
+   switch (blk_queue_zoned_model(q)) {
+   case BLK_ZONED_HA:
+   return sprintf(page, "host-aware\n");
+   case BLK_ZONED_HM:
+   return sprintf(page, "host-managed\n");
+   default:
+   return sprintf(page, "none\n");
+   }
+}
+
 static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
 {
return queue_var_show((blk_queue_nomerges(q) << 1) |
@@ -485,6 +497,11 @@ static struct queue_sysfs_entry queue_nonrot_entry = {
.store = queue_store_nonrot,
 };
 
+static struct queue_sysfs_entry queue_zoned_entry = {
+   .attr = {.name = "zoned", .mode = S_IRUGO },
+   .show = queue_zoned_show,
+};
+
 static struct queue_sysfs_entry queue_nomerges_entry = {
.attr = {.name = "nomerges", .mode = S_IRUGO | S_IWUSR },
.show = queue_nomerges_show,
@@ -546,6 +563,7 @@ static struct attribute *default_attrs[] = {
_discard_zeroes_data_entry.attr,
_write_same_max_entry.attr,
_nonrot_entry.attr,
+   _zoned_entry.attr,
_nomerges_entry.attr,
_rq_affinity_entry.attr,
_iostats_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..f19e16b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -261,6 +261,15 @@ struct blk_queue_tag {
 #define BLK_SCSI_MAX_CMDS  (256)
 #define BLK_SCSI_CMD_PER_LONG  (BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))
 
+/*
+ * Zoned block device models (zoned limit).
+ */
+enum blk_zoned_model {
+   BLK_ZONED_NONE, /* Regular block device 

[PATCH v7 5/7] block: Implement support for zoned block devices

2016-10-05 Thread Damien Le Moal
From: Hannes Reinecke 

Implement zoned block device zone information reporting and reset.
Zone information are reported as struct blk_zone. This implementation
does not differentiate between host-aware and host-managed device
models and is valid for both. Two functions are provided:
blkdev_report_zones for discovering the zone configuration of a
zoned block device, and blkdev_reset_zones for resetting the write
pointer of sequential zones. The helper function blk_queue_zone_size
and bdev_zone_size are also provided for, as the name suggest,
obtaining the zone size (in 512B sectors) of the zones of the device.

Signed-off-by: Hannes Reinecke 

[Damien: * Removed the zone cache
 * Implement report zones operation based on earlier proposal
   by Shaun Tancheff ]
Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Shaun Tancheff 
Tested-by: Shaun Tancheff 
---
 block/Kconfig |   8 ++
 block/Makefile|   2 +-
 block/blk-zoned.c | 257 ++
 include/linux/blkdev.h|  31 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned.h | 103 +
 6 files changed, 401 insertions(+), 1 deletion(-)
 create mode 100644 block/blk-zoned.c
 create mode 100644 include/uapi/linux/blkzoned.h

diff --git a/block/Kconfig b/block/Kconfig
index 5136ad4..7bb9bf8 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -89,6 +89,14 @@ config BLK_DEV_INTEGRITY
T10/SCSI Data Integrity Field or the T13/ATA External Path
Protection.  If in doubt, say N.
 
+config BLK_DEV_ZONED
+   bool "Zoned block device support"
+   ---help---
+   Block layer zoned block device support. This option enables
+   support for ZAC/ZBC host-managed and host-aware zoned block devices.
+
+   Say yes here if you have a ZAC or ZBC storage device.
+
 config BLK_DEV_THROTTLING
bool "Block layer bio throttling support"
depends on BLK_CGROUP=y
diff --git a/block/Makefile b/block/Makefile
index 9eda232..4676969 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -22,4 +22,4 @@ obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
 obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
 obj-$(CONFIG_BLK_CMDLINE_PARSER)   += cmdline-parser.o
 obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
-
+obj-$(CONFIG_BLK_DEV_ZONED)+= blk-zoned.o
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
new file mode 100644
index 000..1603573
--- /dev/null
+++ b/block/blk-zoned.c
@@ -0,0 +1,257 @@
+/*
+ * Zoned block device handling
+ *
+ * Copyright (c) 2015, Hannes Reinecke
+ * Copyright (c) 2015, SUSE Linux GmbH
+ *
+ * Copyright (c) 2016, Damien Le Moal
+ * Copyright (c) 2016, Western Digital
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+static inline sector_t blk_zone_start(struct request_queue *q,
+ sector_t sector)
+{
+   sector_t zone_mask = blk_queue_zone_size(q) - 1;
+
+   return sector & ~zone_mask;
+}
+
+/*
+ * Check that a zone report belongs to the partition.
+ * If yes, fix its start sector and write pointer, copy it in the
+ * zone information array and return true. Return false otherwise.
+ */
+static bool blkdev_report_zone(struct block_device *bdev,
+  struct blk_zone *rep,
+  struct blk_zone *zone)
+{
+   sector_t offset = get_start_sect(bdev);
+
+   if (rep->start < offset)
+   return false;
+
+   rep->start -= offset;
+   if (rep->start + rep->len > bdev->bd_part->nr_sects)
+   return false;
+
+   if (rep->type == BLK_ZONE_TYPE_CONVENTIONAL)
+   rep->wp = rep->start + rep->len;
+   else
+   rep->wp -= offset;
+   memcpy(zone, rep, sizeof(struct blk_zone));
+
+   return true;
+}
+
+/**
+ * blkdev_report_zones - Get zones information
+ * @bdev:  Target block device
+ * @sector:Sector from which to report zones
+ * @zones: Array of zone structures where to return the zones information
+ * @nr_zones:  Number of zone structures in the zone array
+ * @gfp_mask:  Memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *Get zone information starting from the zone containing @sector.
+ *The number of zone information reported may be less than the number
+ *requested by @nr_zones. The number of zones actually reported is
+ *returned in @nr_zones.
+ */
+int blkdev_report_zones(struct block_device *bdev,
+   sector_t sector,
+   struct blk_zone *zones,
+   unsigned int *nr_zones,
+   gfp_t gfp_mask)
+{
+   struct request_queue *q = 

[PATCH v7 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes

2016-10-05 Thread Damien Le Moal
From: Hannes Reinecke 

The queue limits already have a 'chunk_sectors' setting, so
we should be presenting it via sysfs.

Signed-off-by: Hannes Reinecke 

[Damien: Updated Documentation/ABI/testing/sysfs-block]

Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Shaun Tancheff 
Tested-by: Shaun Tancheff 
---
 Documentation/ABI/testing/sysfs-block | 13 +
 block/blk-sysfs.c | 11 +++
 2 files changed, 24 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block 
b/Documentation/ABI/testing/sysfs-block
index 75a5055..ee2d5cd 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -251,3 +251,16 @@ Description:
since drive-managed zoned block devices do not support
zone commands, they will be treated as regular block
devices and zoned will report "none".
+
+What:  /sys/block//queue/chunk_sectors
+Date:  September 2016
+Contact:   Hannes Reinecke 
+Description:
+   chunk_sectors has different meaning depending on the type
+   of the disk. For a RAID device (dm-raid), chunk_sectors
+   indicates the size in 512B sectors of the RAID volume
+   stripe segment. For a zoned block device, either
+   host-aware or host-managed, chunk_sectors indicates the
+   size of 512B sectors of the zones of the device, with
+   the eventual exception of the last zone of the device
+   which may be smaller.
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index ff9cd9c..488c2e2 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -130,6 +130,11 @@ static ssize_t queue_physical_block_size_show(struct 
request_queue *q, char *pag
return queue_var_show(queue_physical_block_size(q), page);
 }
 
+static ssize_t queue_chunk_sectors_show(struct request_queue *q, char *page)
+{
+   return queue_var_show(q->limits.chunk_sectors, page);
+}
+
 static ssize_t queue_io_min_show(struct request_queue *q, char *page)
 {
return queue_var_show(queue_io_min(q), page);
@@ -455,6 +460,11 @@ static struct queue_sysfs_entry 
queue_physical_block_size_entry = {
.show = queue_physical_block_size_show,
 };
 
+static struct queue_sysfs_entry queue_chunk_sectors_entry = {
+   .attr = {.name = "chunk_sectors", .mode = S_IRUGO },
+   .show = queue_chunk_sectors_show,
+};
+
 static struct queue_sysfs_entry queue_io_min_entry = {
.attr = {.name = "minimum_io_size", .mode = S_IRUGO },
.show = queue_io_min_show,
@@ -555,6 +565,7 @@ static struct attribute *default_attrs[] = {
_hw_sector_size_entry.attr,
_logical_block_size_entry.attr,
_physical_block_size_entry.attr,
+   _chunk_sectors_entry.attr,
_io_min_entry.attr,
_io_opt_entry.attr,
_discard_granularity_entry.attr,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 3/7] block: update chunk_sectors in blk_stack_limits()

2016-10-05 Thread Damien Le Moal
From: Hannes Reinecke 

Signed-off-by: Hannes Reinecke 
Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Shaun Tancheff 
Tested-by: Shaun Tancheff 
---
 block/blk-settings.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b1d5b7f..55369a6 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -631,6 +631,10 @@ int blk_stack_limits(struct queue_limits *t, struct 
queue_limits *b,
t->discard_granularity;
}
 
+   if (b->chunk_sectors)
+   t->chunk_sectors = min_not_zero(t->chunk_sectors,
+   b->chunk_sectors);
+
return ret;
 }
 EXPORT_SYMBOL(blk_stack_limits);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 6/7] sd: Implement support for ZBC devices

2016-10-05 Thread Damien Le Moal
From: Hannes Reinecke 

Implement ZBC support functions to setup zoned disks, both
host-managed and host-aware models. Only zoned disks that satisfy
the following conditions are supported:
1) All zones are the same size, with the exception of an eventual
   last smaller runt zone.
2) For host-managed disks, reads are unrestricted (reads are not
   failed due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions are setup with
a capacity of 0 to prevent their use.

The function sd_zbc_read_zones, called from sd_revalidate_disk,
checks that the device satisfies the above two constraints. This
function may also change the disk capacity previously set by
sd_read_capacity for devices reporting only the capacity of
conventional zones at the beginning of the LBA range (i.e. devices
reporting rc_basis set to 0).

The capacity message output was moved out of sd_read_capacity into
a new function sd_print_capacity to include this eventual capacity
change by sd_zbc_read_zones. This new function also includes a call
to sd_zbc_print_zones to display the number of zones and zone size
of the device.

Signed-off-by: Hannes Reinecke 

[Damien: * Removed zone cache support
 * Removed mapping of discard to reset write pointer command
 * Modified sd_zbc_read_zones to include checks that the
   device satisfies the kernel constraints
 * Implemeted REPORT ZONES setup and post-processing based
   on code from Shaun Tancheff 
 * Removed confusing use of 512B sector units in functions
   interface]
Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Shaun Tancheff 
Tested-by: Shaun Tancheff 
---
 drivers/scsi/Makefile |   1 +
 drivers/scsi/sd.c | 148 ---
 drivers/scsi/sd.h |  70 +
 drivers/scsi/sd_zbc.c | 638 ++
 include/scsi/scsi_proto.h |  17 ++
 5 files changed, 839 insertions(+), 35 deletions(-)
 create mode 100644 drivers/scsi/sd_zbc.c

diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index d539798..fabcb6d 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -179,6 +179,7 @@ hv_storvsc-y:= storvsc_drv.o
 
 sd_mod-objs:= sd.o
 sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
+sd_mod-$(CONFIG_BLK_DEV_ZONED) += sd_zbc.o
 
 sr_mod-objs:= sr.o sr_ioctl.o sr_vendor.o
 ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d3e852a..e53d958 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -92,6 +92,7 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
+MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
 
 #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
 #define SD_MINORS  16
@@ -162,7 +163,7 @@ cache_type_store(struct device *dev, struct 
device_attribute *attr,
static const char temp[] = "temporary ";
int len;
 
-   if (sdp->type != TYPE_DISK)
+   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
/* no cache control on RBC devices; theoretically they
 * can do it, but there's probably so many exceptions
 * it's not worth the risk */
@@ -261,7 +262,7 @@ allow_restart_store(struct device *dev, struct 
device_attribute *attr,
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
 
-   if (sdp->type != TYPE_DISK)
+   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
return -EINVAL;
 
sdp->allow_restart = simple_strtoul(buf, NULL, 10);
@@ -391,6 +392,11 @@ provisioning_mode_store(struct device *dev, struct 
device_attribute *attr,
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
 
+   if (sd_is_zoned(sdkp)) {
+   sd_config_discard(sdkp, SD_LBP_DISABLE);
+   return count;
+   }
+
if (sdp->type != TYPE_DISK)
return -EINVAL;
 
@@ -458,7 +464,7 @@ max_write_same_blocks_store(struct device *dev, struct 
device_attribute *attr,
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
 
-   if (sdp->type != TYPE_DISK)
+   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
return -EINVAL;
 
err = kstrtoul(buf, 10, );
@@ -843,6 +849,12 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
 
BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
 
+   if (sd_is_zoned(sdkp)) {
+   ret = sd_zbc_setup_write_cmnd(cmd);
+   if (ret != BLKPREP_OK)
+   return ret;
+   }
+
sector >>= ilog2(sdp->sector_size) - 9;
nr_sectors >>= ilog2(sdp->sector_size) - 9;
 
@@ -900,19 

[PATCH v7 7/7] blk-zoned: implement ioctls

2016-10-05 Thread Damien Le Moal
From: Shaun Tancheff 

Adds the new BLKREPORTZONE and BLKRESETZONE ioctls for respectively
obtaining the zone configuration of a zoned block device and resetting
the write pointer of sequential zones of a zoned block device.

The BLKREPORTZONE ioctl maps directly to a single call of the function
blkdev_report_zones. The zone information result is passed as an array
of struct blk_zone identical to the structure used internally for
processing the REQ_OP_ZONE_REPORT operation.  The BLKRESETZONE ioctl
maps to a single call of the blkdev_reset_zones function.

Signed-off-by: Shaun Tancheff 
Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Hannes Reinecke 

---
 block/blk-zoned.c | 93 +++
 block/ioctl.c |  4 ++
 include/linux/blkdev.h| 21 ++
 include/uapi/linux/blkzoned.h | 40 +++
 include/uapi/linux/fs.h   |  4 ++
 5 files changed, 162 insertions(+)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 1603573..667f95d 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -255,3 +255,96 @@ int blkdev_reset_zones(struct block_device *bdev,
return 0;
 }
 EXPORT_SYMBOL_GPL(blkdev_reset_zones);
+
+/**
+ * BLKREPORTZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
+ unsigned int cmd, unsigned long arg)
+{
+   void __user *argp = (void __user *)arg;
+   struct request_queue *q;
+   struct blk_zone_report rep;
+   struct blk_zone *zones;
+   int ret;
+
+   if (!argp)
+   return -EINVAL;
+
+   q = bdev_get_queue(bdev);
+   if (!q)
+   return -ENXIO;
+
+   if (!blk_queue_is_zoned(q))
+   return -ENOTTY;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EACCES;
+
+   if (copy_from_user(, argp, sizeof(struct blk_zone_report)))
+   return -EFAULT;
+
+   if (!rep.nr_zones)
+   return -EINVAL;
+
+   zones = kcalloc(rep.nr_zones, sizeof(struct blk_zone), GFP_KERNEL);
+   if (!zones)
+   return -ENOMEM;
+
+   ret = blkdev_report_zones(bdev, rep.sector,
+ zones, _zones,
+ GFP_KERNEL);
+   if (ret)
+   goto out;
+
+   if (copy_to_user(argp, , sizeof(struct blk_zone_report))) {
+   ret = -EFAULT;
+   goto out;
+   }
+
+   if (rep.nr_zones) {
+   if (copy_to_user(argp + sizeof(struct blk_zone_report), zones,
+sizeof(struct blk_zone) * rep.nr_zones))
+   ret = -EFAULT;
+   }
+
+ out:
+   kfree(zones);
+
+   return ret;
+}
+
+/**
+ * BLKRESETZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_reset_zones_ioctl(struct block_device *bdev, fmode_t mode,
+unsigned int cmd, unsigned long arg)
+{
+   void __user *argp = (void __user *)arg;
+   struct request_queue *q;
+   struct blk_zone_range zrange;
+
+   if (!argp)
+   return -EINVAL;
+
+   q = bdev_get_queue(bdev);
+   if (!q)
+   return -ENXIO;
+
+   if (!blk_queue_is_zoned(q))
+   return -ENOTTY;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EACCES;
+
+   if (!(mode & FMODE_WRITE))
+   return -EBADF;
+
+   if (copy_from_user(, argp, sizeof(struct blk_zone_range)))
+   return -EFAULT;
+
+   return blkdev_reset_zones(bdev, zrange.sector, zrange.nr_sectors,
+ GFP_KERNEL);
+}
diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..448f78a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -513,6 +513,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, 
unsigned cmd,
BLKDEV_DISCARD_SECURE);
case BLKZEROOUT:
return blk_ioctl_zeroout(bdev, mode, arg);
+   case BLKREPORTZONE:
+   return blkdev_report_zones_ioctl(bdev, mode, cmd, arg);
+   case BLKRESETZONE:
+   return blkdev_reset_zones_ioctl(bdev, mode, cmd, arg);
case HDIO_GETGEO:
return blkdev_getgeo(bdev, argp);
case BLKRAGET:
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 252043f..90097dd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -316,6 +316,27 @@ extern int blkdev_report_zones(struct block_device *bdev,
 extern int blkdev_reset_zones(struct block_device *bdev, sector_t sectors,
  sector_t nr_sectors, gfp_t gfp_mask);
 
+extern int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
+

[PATCH v7 4/7] block: Define zoned block device operations

2016-10-05 Thread Damien Le Moal
From: Shaun Tancheff 

Define REQ_OP_ZONE_REPORT and REQ_OP_ZONE_RESET for handling zones of
host-managed and host-aware zoned block devices. With with these two
new operations, the total number of operations defined reaches 8 and
still fits with the 3 bits definition of REQ_OP_BITS.

Signed-off-by: Shaun Tancheff 
Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Hannes Reinecke 

---
 block/blk-core.c  | 4 
 include/linux/blk_types.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 14d7c07..e4eda5d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1941,6 +1941,10 @@ generic_make_request_checks(struct bio *bio)
case REQ_OP_WRITE_SAME:
if (!bdev_write_same(bio->bi_bdev))
goto not_supported;
+   case REQ_OP_ZONE_REPORT:
+   case REQ_OP_ZONE_RESET:
+   if (!bdev_is_zoned(bio->bi_bdev))
+   goto not_supported;
break;
default:
break;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cd395ec..dd50dce 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -243,6 +243,8 @@ enum req_op {
REQ_OP_SECURE_ERASE,/* request to securely erase sectors */
REQ_OP_WRITE_SAME,  /* write same block many times */
REQ_OP_FLUSH,   /* request for cache flush */
+   REQ_OP_ZONE_REPORT, /* Get zone information */
+   REQ_OP_ZONE_RESET,  /* Reset a zone write pointer */
 };
 
 #define REQ_OP_BITS 3
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 0/7] ZBC / Zoned block device support

2016-10-05 Thread Damien Le Moal
This series introduces support for zoned block devices. It integrates
earlier submissions by Hannes Reinecke and Shaun Tancheff. Compared to the
previous series version, the code was significantly simplified by limiting
support to zoned devices satisfying the following conditions:
1) All zones of the device are the same size, with the exception of an
   eventual last smaller runt zone.
2) For host-managed disks, reads must be unrestricted (read commands do not
   fail due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions are ignored.

These 2 conditions allowed dropping the zone information cache implemented
in the previous version. This simplifies the code and also reduces the memory
consumption at run time. Support for zoned devices now only require one bit
per zone (less than 8KB in total). This bit field is used to write-lock
zones and prevent the concurrent execution of multiple write commands in
the same zone. This avoids write ordering problems at dispatch time, for
both the simple queue and scsi-mq settings.

The new operations introduced to suport zone manipulation was reduced to
only the two main ZBC/ZAC defined commands: REPORT ZONES (REQ_OP_ZONE_REPORT)
and RESET WRITE POINTER (REQ_OP_ZONE_RESET). This brings the total number of
operations defined to 8, which fits in the 3 bits (REQ_OP_BITS) reserved for
operation code in bio->bi_opf and req->cmd_flags.

Most of the ZBC specific code is kept out of sd.c and implemented in the
new file sd_zbc.c. Similarly, at the block layer, most of the zoned block
device code is implemented in the new blk-zoned.c.

For host-managed zoned block devices, the sequential write constraint of
write pointer zones is exposed to the user. Users of the disk (applications,
file systems or device mappers) must sequentially write to zones. This means
that for raw block device accesses from applications, buffered writes are
unreliable and direct I/Os must be used (or buffered writes with O_SYNC).

Access to zone manipulation operations is also provided to applications
through a set of new ioctls. This allows applications operating on raw
block devices (e.g. mkfs.xxx) to discover a device zone layout and
manipulate zone state.

Changes from v6:
* Fixed problems with zone write locking:
  - Wrong sdkp->zone_wlock bitmap allocation size
  - Incorrect (reversed condition) test of lock state with test_and_set_bit
  - Potential error in sd_setup_read_write_cmnd could leave a zone locked
without the locking write command being executed

Changes from v5:
* Rebased on Jens' for-4.9/block branch (v5 is based on next-20160928)

Changes from v4:
* Changed interface of sd_zbc_setup_read_write

Changes from v3:
* Fixed several typos and tabs/spaces
* Added description of zoned and chunk_sectors queue attributes in
  Documentation/ABI/testing/sysfs-block
* Fixed sd_read_capacity call in sd.c and to avoid missing information on
  the first pass of a disk scan
* Fixed scsi_disk zone related field to use logical block size unit instead
  of 512B sector unit.

Changes from v2:
* Use kcalloc to allocate zone information array for ioctl
* Use kcalloc to allocate zone information array for ioctl
* Export GPL the functions blkdev_report_zones and blkdev_reset_zones
* Shuffled uapi definitions from patch 7 into patch 5

Damien Le Moal (1):
  block: Add 'zoned' queue limit

Hannes Reinecke (4):
  blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  block: update chunk_sectors in blk_stack_limits()
  block: Implement support for zoned block devices
  sd: Implement support for ZBC devices

Shaun Tancheff (2):
  block: Define zoned block device operations
  blk-zoned: implement ioctls

 Documentation/ABI/testing/sysfs-block |  29 ++
 block/Kconfig |   8 +
 block/Makefile|   2 +-
 block/blk-core.c  |   4 +
 block/blk-settings.c  |   5 +
 block/blk-sysfs.c |  29 ++
 block/blk-zoned.c | 350 +++
 block/ioctl.c |   4 +
 drivers/scsi/Makefile |   1 +
 drivers/scsi/sd.c | 148 ++--
 drivers/scsi/sd.h |  70 
 drivers/scsi/sd_zbc.c | 638 ++
 include/linux/blk_types.h |   2 +
 include/linux/blkdev.h|  99 ++
 include/scsi/scsi_proto.h |  17 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned.h | 143 
 include/uapi/linux/fs.h   |   4 +
 18 files changed, 1518 insertions(+), 36 deletions(-)
 create mode 100644 block/blk-zoned.c
 create mode 100644 drivers/scsi/sd_zbc.c
 create mode 100644 include/uapi/linux/blkzoned.h

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[PATCH] scsi: ufshcd: fix possible unclocked register access

2016-10-05 Thread Subhash Jadavani
Vendor specific setup_clocks callback may require the clocks managed
by ufshcd driver to be ON. So if the vendor specific setup_clocks callback
is called while the required clocks are turned off, it could result into
unclocked register access.

To prevent possible unclock register access, this change makes sure that
required clocks remain enabled before calling into vendor specific
setup_clocks callback.

Signed-off-by: Subhash Jadavani 
---
 drivers/scsi/ufs/ufshcd.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 05c7456..acee5a3 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -5389,6 +5389,17 @@ static int __ufshcd_setup_clocks(struct ufs_hba *hba, 
bool on,
if (!head || list_empty(head))
goto out;
 
+   /*
+* vendor specific setup_clocks ops may depend on clocks managed by
+* this standard driver hence call the vendor specific setup_clocks
+* before disabling the clocks managed here.
+*/
+   if (!on) {
+   ret = ufshcd_vops_setup_clocks(hba, on);
+   if (ret)
+   return ret;
+   }
+
list_for_each_entry(clki, head, list) {
if (!IS_ERR_OR_NULL(clki->clk)) {
if (skip_ref_clk && !strcmp(clki->name, "ref_clk"))
@@ -5410,6 +5421,11 @@ static int __ufshcd_setup_clocks(struct ufs_hba *hba, 
bool on,
}
}
 
+   /*
+* vendor specific setup_clocks ops may depend on clocks managed by
+* this standard driver hence call the vendor specific setup_clocks
+* after enabling the clocks managed here.
+*/
ret = ufshcd_vops_setup_clocks(hba, on);
 out:
if (ret) {
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()

2016-10-05 Thread Bart Van Assche

On 10/05/2016 03:49 PM, Ming Lei wrote:

We can use srcu read lock for BLOCKING and rcu read lock for non-BLOCKING,
by putting *_read_lock() and *_read_unlock() into two wrappers, which
should minimize the cost of srcu read lock & unlock and the code is still easy
to read & verify.


Hello Ming,

The lock checking algorithms in the sparse and smatch static checkers 
are unable to deal with code of the type "if (condition) (un)lock()". So 
unless someone has a better proposal my preference is to use the 
approach from the patch at the start of this e-mail thread.


Thanks,

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()

2016-10-05 Thread Ming Lei
On Thu, Oct 6, 2016 at 5:08 AM, Bart Van Assche
 wrote:
> On 10/05/2016 12:11 PM, Sagi Grimberg wrote:
>>
>> I was referring to weather we can take srcu in the submission path
>> conditional of the hctx being STOPPED?
>
>
> Hello Sagi,
>
> Regarding run-time overhead:
> * rcu_read_lock() is a no-op on CONFIG_PREEMPT_NONE kernels and is
>   translated into preempt_disable() with preemption enabled. The latter
>   function modifies a per-cpu variable.
> * Checking BLK_MQ_S_STOPPED before taking an rcu or srcu lock is only
>   safe if the BLK_MQ_S_STOPPED flag is tested in such a way that the
>   compiler is told to reread the hctx flags (READ_ONCE()) and if the
>   compiler and CPU are told not to reorder test_bit() with the
>   memory accesses in (s)rcu_read_lock(). To avoid races
>   BLK_MQ_S_STOPPED will have to be tested a second time after the lock
>   has been obtained, similar to the double-checked-locking pattern.
> * srcu_read_lock() reads a word from the srcu structure, disables
>   preemption, calls __srcu_read_lock() and re-enables preemption. The
>   latter function increments two CPU-local variables and triggers a
>   memory barrier (smp_mp()).

We can use srcu read lock for BLOCKING and rcu read lock for non-BLOCKING,
by putting *_read_lock() and *_read_unlock() into two wrappers, which
should minimize the cost of srcu read lock & unlock and the code is still easy
to read & verify.

>
> Swapping srcu_read_lock() and the BLK_MQ_S_STOPPED flag test will make the
> code more complicated. Going back to the implementation that calls
> rcu_read_lock() if .queue_rq() won't sleep will result in an implementation
> that is easier to read and to verify.

Yeah, I agree.

Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq

2016-10-05 Thread Bart Van Assche

On 10/05/2016 10:38 AM, Sagi Grimberg wrote:

+static void srp_mq_wait_for_queuecommand(struct Scsi_Host *shost)
+{
+struct scsi_device *sdev;
+struct request_queue *q;
+
+shost_for_each_device(sdev, shost) {
+q = sdev->request_queue;
+
+blk_mq_quiesce_queue(q);
+blk_mq_resume_queue(q);
+}
+}
+


This *should* live in scsi_lib.c. I suspect that
various drivers would really want this functionality.


Hello Sagi,

There are multiple direct blk_*() calls in other SCSI transport drivers. 
So my proposal is to wait with moving this code into scsi_lib.c until 
there is a second user of this code.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()

2016-10-05 Thread Bart Van Assche

On 10/05/2016 12:11 PM, Sagi Grimberg wrote:

I was referring to weather we can take srcu in the submission path
conditional of the hctx being STOPPED?


Hello Sagi,

Regarding run-time overhead:
* rcu_read_lock() is a no-op on CONFIG_PREEMPT_NONE kernels and is
  translated into preempt_disable() with preemption enabled. The latter
  function modifies a per-cpu variable.
* Checking BLK_MQ_S_STOPPED before taking an rcu or srcu lock is only
  safe if the BLK_MQ_S_STOPPED flag is tested in such a way that the
  compiler is told to reread the hctx flags (READ_ONCE()) and if the
  compiler and CPU are told not to reorder test_bit() with the
  memory accesses in (s)rcu_read_lock(). To avoid races
  BLK_MQ_S_STOPPED will have to be tested a second time after the lock
  has been obtained, similar to the double-checked-locking pattern.
* srcu_read_lock() reads a word from the srcu structure, disables
  preemption, calls __srcu_read_lock() and re-enables preemption. The
  latter function increments two CPU-local variables and triggers a
  memory barrier (smp_mp()).

Swapping srcu_read_lock() and the BLK_MQ_S_STOPPED flag test will make 
the code more complicated. Going back to the implementation that calls 
rcu_read_lock() if .queue_rq() won't sleep will result in an 
implementation that is easier to read and to verify. If I overlooked 
something, please let me know.


Bart.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/12] Fixes, cleanup and g_NCR5380_mmio/g_NCR5380 merger

2016-10-05 Thread Ondrej Zary
On Tuesday 04 October 2016 07:40:50 Finn Thain wrote:
> This patch series has fixes for compatibility, reliability and
> performance issues and some cleanup. It also includes a new version
> of Ondrej Zary's patch that merges g_NCR5380_mmio into g_NCR5380.
>
> I've tested this patch series on a Powerbook 180. If someone would
> test some of the other platforms that would be very helpful. All
> drivers were compile-tested.
>
> (Apologies for any duplicate messages.)

The patches won't apply against:
4.9/scsi-queue
Linus' master with my patches applied

What tree should I try?

> Finn Thain (12):
>   scsi/g_NCR5380: Merge g_NCR5380 and g_NCR5380_mmio drivers
>   scsi/cumana_1: Remove unused cumanascsi_setup() function
>   scsi/atari_scsi: Make device register accessors re-enterant
>   scsi/ncr5380: Simplify register polling limit
>   scsi/ncr5380: Increase register polling limit
>   scsi/ncr5380: Improve hostdata struct member alignment and
> cache-ability
>   scsi/ncr5380: Store IO ports and addresses in host private data
>   scsi/ncr5380: Use correct types for device register accessors
>   scsi/ncr5380: Pass hostdata pointer to register polling routines
>   scsi/ncr5380: Expedite register polling
>   scsi/ncr5380: Use correct types for DMA routines
>   scsi/ncr5380: Suppress unhelpful "interrupt without IRQ bit" message
>
>  MAINTAINERS   |   1 -
>  drivers/scsi/Kconfig  |  32 +
>  drivers/scsi/Makefile |   1 -
>  drivers/scsi/NCR5380.c| 137 +++-
>  drivers/scsi/NCR5380.h|  87 +
>  drivers/scsi/arm/cumana_1.c   |  98 +++---
>  drivers/scsi/arm/oak.c|  34 +++--
>  drivers/scsi/atari_scsi.c |  77 ++-
>  drivers/scsi/dmx3191d.c   |  20 +--
>  drivers/scsi/g_NCR5380.c  | 290
> -- drivers/scsi/g_NCR5380.h  | 
> 32 +
>  drivers/scsi/g_NCR5380_mmio.c |  10 --
>  drivers/scsi/mac_scsi.c   |  83 +---
>  drivers/scsi/sun3_scsi.c  |  80 ++--
>  14 files changed, 495 insertions(+), 487 deletions(-)
>  delete mode 100644 drivers/scsi/g_NCR5380_mmio.c


-- 
Ondrej Zary
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()

2016-10-05 Thread Sagi Grimberg



Hello Ming,

Can you have a look at the attached patch? That patch uses an srcu read
lock for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has
been set. Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag.
Just like previous versions, this patch has been tested.


Hey Bart,

Do we care about the synchronization of queue_rq and/or
blk_mq_run_hw_queue of the hctx is not stopped?

I'm wandering if we can avoid introducing new barriers in the
submission path of its not absolutely needed.


Hello Sagi,


Hey Bart,



I'm not sure whether the new blk_quiesce_queue() function is useful
without stopping all hardware contexts first. In other words, in my view
setting BLK_MQ_F_BLOCKING flag before calling blk_quiesce_queue() is
sufficient and I don't think that a new QUEUE_FLAG_QUIESCING flag is
necessary.


I was referring to weather we can take srcu in the submission path
conditional of the hctx being STOPPED?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()

2016-10-05 Thread Bart Van Assche

On 10/05/2016 11:14 AM, Sagi Grimberg wrote:

Hello Ming,

Can you have a look at the attached patch? That patch uses an srcu read
lock for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has
been set. Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag.
Just like previous versions, this patch has been tested.


Hey Bart,

Do we care about the synchronization of queue_rq and/or
blk_mq_run_hw_queue of the hctx is not stopped?

I'm wandering if we can avoid introducing new barriers in the
submission path of its not absolutely needed.


Hello Sagi,

I'm not sure whether the new blk_quiesce_queue() function is useful 
without stopping all hardware contexts first. In other words, in my view 
setting BLK_MQ_F_BLOCKING flag before calling blk_quiesce_queue() is 
sufficient and I don't think that a new QUEUE_FLAG_QUIESCING flag is 
necessary.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iscsi_trx going into D state

2016-10-05 Thread Robert LeBlanc
Thanks, we will apply that too. We'd like to get this stable. We'll
report back on what we find with these patches.

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Oct 5, 2016 at 12:03 PM, Christoph Hellwig  wrote:
> Hi Robert,
>
> I actually got the name wrong, the patch wasn't from Lee, but from Zhu,
> another SuSE engineer.  This is the one:
>
> http://www.spinics.net/lists/target-devel/msg13463.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()

2016-10-05 Thread Sagi Grimberg



Hello Ming,

Can you have a look at the attached patch? That patch uses an srcu read
lock for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has
been set. Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag.
Just like previous versions, this patch has been tested.


Hey Bart,

Do we care about the synchronization of queue_rq and/or
blk_mq_run_hw_queue of the hctx is not stopped?

I'm wandering if we can avoid introducing new barriers in the
submission path of its not absolutely needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iscsi_trx going into D state

2016-10-05 Thread Christoph Hellwig
Hi Robert,

I actually got the name wrong, the patch wasn't from Lee, but from Zhu,
another SuSE engineer.  This is the one:

http://www.spinics.net/lists/target-devel/msg13463.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/7] blk-mq: Introduce blk_mq_queue_stopped()

2016-10-05 Thread Sagi Grimberg

Looks good,

Reviewed-by: Sagi Grimberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/7] [RFC] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code

2016-10-05 Thread Sagi Grimberg

Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of
QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations
that became superfluous because of this change. This patch fixes
a race condition: using queue_flag_clear_unlocked() is not safe
if any other function that manipulates the queue flags can be
called concurrently, e.g. blk_cleanup_queue(). Untested.


This looks good to me, but I know keith had all sort of
creative ways to challenge this are so I'd wait for his
input...
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: ufs: add support for BLKSECDISCARD

2016-10-05 Thread Subhash Jadavani

Hi SzymonX,

On 2016-10-04 05:55, Mielczarek, SzymonX wrote:

Hi Jadavani,

_> Did you mean sending purge when bProvisioningType is set to 02h
(TPRZ = 0)? why do we want to send the purge if TPRZ is 1?_

By doing Purge we want to protect from die level attacks (JESD220B
12.2.3.3). Once Erase is enabled on partition, the Read will return
zeros, however data can still reside in unmapped memory on flash
(behind mapping/translation table) (12.2.2.2). We expose BLKSECDISCARD
on "Erase enabled" partitions just to remove possibility on die level
attacks.


Now it make sense, i wasn't expecting that this patch is to prevent die 
level attack. Do you want to make that explicit in the commit text?




Are you suggesting that this check is not required, and in any TPRZ
(thus 02h and 03h) BLKSECDISCARD (this Purge) shall be enabled? That's
also possible.


Yes, for BLKSECDISCARD, isn't it good to issue purge for TPRZ=0 
(bProvisioningType = 3) to make sure we can't read back data?




_> We had seen purge taking few mins to complete with some of the UFS
device vendors._

_> Did you run any experiments to major the time taken for purge to
complete?_

Yes, we did several experiments around Dec 2015, and the time of Purge
operation with software overhead was varying between 100-500 seconds
(!), with typical time approx. 350 seconds! We also consulted one
vendor on this observation, and got response that Purge times over 1
min are possible, depending on flash state.


That's true.
Purge time depends on flash state and it also varies a lot from vendor 
to vendor.
Anything over a min may not be good for user experience (especially for 
mobile) and user may simply abort (phone restart) thinking that device 
isn't stuck.




BR,

Szymon

-Original Message-
From: Pielaszkiewicz, Tomasz
Sent: Tuesday, October 4, 2016 1:41 PM
To: subha...@codeaurora.org; Wodkowski, PawelX
; Mielczarek, SzymonX

Cc: linux-scsi@vger.kernel.org; hun...@vger.kernel.org; Hunter, Adrian
; pielaszkiew...@vger.kernel.org;
ja...@vger.kernel.org; Janca, Grzegorz ;
linux-scsi-ow...@vger.kernel.org
Subject: RE: [PATCH] scsi: ufs: add support for BLKSECDISCARD

Hi,

Adding Szymon, who took over Pawel's work.

Tomek

-Original Message-

From: subha...@codeaurora.org [mailto:subha...@codeaurora.org]

Sent: Tuesday, September 27, 2016 10:18 PM

To: Wodkowski, PawelX

Cc: linux-scsi@vger.kernel.org; hun...@vger.kernel.org; Hunter,
Adrian; pielaszkiew...@vger.kernel.org; Pielaszkiewicz, Tomasz;
ja...@vger.kernel.org; Janca, Grzegorz;
linux-scsi-ow...@vger.kernel.org

Subject: Re: [PATCH] scsi: ufs: add support for BLKSECDISCARD

Hi Pawel,

Please find some comments inline.

On 2016-07-26 04:56, Pawel Wodkowski wrote:


Add BLKSECDISCAD feature support if LU is provisioned for TPRZ



(bProvisioningType = 3).


Did you mean sending purge when bProvisioningType is set to 02h (TPRZ
= 0)? why do we want to send the purge if TPRZ is 1?






To perform BLKSECDISCAD driver issue purge operation after each



discard SCSI command with REQ_SECURE flag set, and delay calling



scsi_done() till purge finish. This operation might long so block



requests from SCSI layer in ufshcd_queueucommand() and then unblock

it


after purge finish.


We had seen purge taking few mins to complete with some of the UFS
device vendors.

Did you run any experiments to major the time taken for purge to
complete?






Signed-off-by: Pawel Wodkowski 



---



 drivers/scsi/ufs/ufs.h|  19 +



 drivers/scsi/ufs/ufshcd.c | 187



+-



 drivers/scsi/ufs/ufshcd.h |   6 ++



 3 files changed, 208 insertions(+), 4 deletions(-)







diff --git a/drivers/scsi/ufs/ufs.h b/drivers/scsi/ufs/ufs.h index



b291fa6ed2ad..2f769974fda1 100644



--- a/drivers/scsi/ufs/ufs.h



+++ b/drivers/scsi/ufs/ufs.h



@@ -132,12 +132,14 @@ enum flag_idn {



QUERY_FLAG_IDN_FDEVICEINIT  = 0x01,



QUERY_FLAG_IDN_PWR_ON_WPE = 0x03,



QUERY_FLAG_IDN_BKOPS_EN = 0x04,



+  QUERY_FLAG_IDN_PURGE_EN = 0x06,



 };







 /* Attribute idn for Query requests */  enum attr_idn {



QUERY_ATTR_IDN_ACTIVE_ICC_LVL  = 0x03,



QUERY_ATTR_IDN_BKOPS_STATUS= 0x05,



+  QUERY_ATTR_IDN_PURGE_STATUS   = 0x06,



QUERY_ATTR_IDN_EE_CONTROL = 0x0D,



QUERY_ATTR_IDN_EE_STATUS = 0x0E,



 };



@@ -247,6 +249,13 @@ enum {



UFSHCD_AMP   = 3,



 };







+/* Provisioning type */



+enum unit_desc_param_provisioning_type {



+  THIN_PROVISIONING_DISABLED  =

0x00,


+  THIN_PROVISIONING_ENABLED_TPRZ_0 = 0x02,



+  THIN_PROVISIONING_ENABLED_TPRZ_1 = 0x03,



+};



+



Re: iscsi_trx going into D state

2016-10-05 Thread Robert LeBlanc
We are not able to identify the patch that you mentioned from Lee, can
you give us a commit or a link to the patch?

Thanks,

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Oct 4, 2016 at 5:46 AM, Christoph Hellwig  wrote:
> On Tue, Oct 04, 2016 at 11:11:18AM +0200, Hannes Reinecke wrote:
>> Hmm. Looking at the code it looks as we might miss some calls to
>> 'complete'. Can you try with the attached patch?
>
> That only looks slightly better than the original.  What this really
> needs is a waitqueue and and waitevent on sess->ncon.  Although
> that will need a bit more refactoring around that code.  There also
> are a few more ovbious issues around it, e.g. iscsit_close_connection
> needs to use atomic_dec_and_test on sess->nconn instead of having
> separate atomic_dec and atomic_read calls, and a lot of the 0 or 1
> atomic_ts in this code should be replaced with atomic bitops.
>
> Btw, there also was a fix from Lee in this area that added a missing
> wakeup, make sure your tree already has that.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/7] [RFC] nvme: Fix a race condition

2016-10-05 Thread Sagi Grimberg



Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
returns. Untested.

Signed-off-by: Bart Van Assche 
Cc: Keith Busch 
Cc: Christoph Hellwig 
Cc: Sagi Grimberg 


Bart this looks really good! and possibly fixes an issue
I've been chasing with fabrics a while ago. I'll take it
for testing but you can add my:

Reviewed-by: Sagi Grimberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq

2016-10-05 Thread Sagi Grimberg



+static void srp_mq_wait_for_queuecommand(struct Scsi_Host *shost)
+{
+   struct scsi_device *sdev;
+   struct request_queue *q;
+
+   shost_for_each_device(sdev, shost) {
+   q = sdev->request_queue;
+
+   blk_mq_quiesce_queue(q);
+   blk_mq_resume_queue(q);
+   }
+}
+


This *should* live in scsi_lib.c. I suspect that
various drivers would really want this functionality.

Thoughts?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()

2016-10-05 Thread Ming Lei
On Wed, Oct 5, 2016 at 10:46 PM, Bart Van Assche
 wrote:
> On 10/04/16 21:32, Ming Lei wrote:
>>
>> On Wed, Oct 5, 2016 at 12:16 PM, Bart Van Assche
>>  wrote:
>>>
>>> On 10/01/16 15:56, Ming Lei wrote:


 If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(),
 the above code needn't to be duplicated any more.
>>>
>>>
>>> Can you have a look at the attached patch? That patch uses an srcu read
>>> lock
>>> for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has been
>>> set.
>>
>>
>> That is much cleaner now.
>>
>>> Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. Just like
>>> previous versions, this patch has been tested.
>>
>>
>> I think the flag of QUEUE_FLAG_QUIESCING is still needed because we
>> have to set this flag to prevent new coming .queue_rq() from being run,
>> and synchronize_srcu() won't wait for completion of that at all (see
>> section of 'Update-Side Primitives' in [1]).
>>
>> [1] https://lwn.net/Articles/202847/
>
>
> Hello Ming,
>
> How about using the existing flag BLK_MQ_S_STOPPED instead of introducing a
> new QUEUE_FLAG_QUIESCING flag? From the comment above blk_mq_quiesce_queue()

That looks fine, and we need to stop direct issue first after hw queue
becomes BLK_MQ_S_STOPPED.

> in the patch that was attached to my previous e-mail: "Additionally, it is
> not prevented that new queue_rq() calls occur unless the queue has been
> stopped first."
>
> Thanks,
>
> Bart.



-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG and Oops while trying to issue a discard to LVM on RAID1 md

2016-10-05 Thread Sitsofe Wheeler
On 5 October 2016 at 16:04, Sitsofe Wheeler  wrote:
> On 4 October 2016 at 07:20, Sitsofe Wheeler  wrote:
>> On 4 October 2016 at 07:17, Sitsofe Wheeler  wrote:
>>> While trying to do a discard inside an ESXi 6 VM to an LVM device atop
>>> an md RAID1 device composed of two SATA SSDs passed up as a raw disk
>>> mappings through a PVSCSI controller, this BUG followed by an Oops was
>>> hit:
>>>
>>> [   86.902888] [ cut here ]
>>> [   86.904600] kernel BUG at arch/x86/kernel/pci-nommu.c:66!

(sent that a bit too soon)

On a 4.8.0 kernel the problem seems to have shifted a bit but still
results in a lock up:

[   26.208152] [ cut here ]
[   26.208935] kernel BUG at ./include/linux/scatterlist.h:90!
[   26.209799] invalid opcode:  [#1] SMP
[   26.210454] Modules linked in: vmw_vsock_vmci_transport vsock
sb_edac edac_core intel_powerclamp coretemp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel raid1 intel_rapl_perf ppdev
vmw_balloon pcspkr joydev vmxnet3 acpi_cpufreq tpm_tis tpm_tis_core
tpm vmw_vmci fjes shpchp parport_pc parport i2c_piix4 dm_multipath
vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw vmw_pvscsi
ata_generic pata_acpi
[   26.216797] CPU: 0 PID: 220 Comm: kworker/0:1H Not tainted
4.8.0-1.vanilla.knurd.1.fc24.x86_64 #1
[   26.218191] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[   26.219861] Workqueue: kblockd blk_delay_work
[   26.220570] task: 9608bf30 task.stack: 9608b9d9
[   26.221505] RIP: 0010:[]  []
blk_rq_map_sg+0x317/0x560
[   26.222812] RSP: 0018:9608b9d93b78  EFLAGS: 00010002
[   26.223650] RAX: 002e RBX: 0200 RCX: 9608bb71bd00
[   26.224766] RDX: 0007fc01 RSI: 0002 RDI: 0400
[   26.225867] RBP: 9608b9d93c00 R08: 9608bec1ca00 R09: 
[   26.226992] R10: 9608bb71bd00 R11: 9608bb74d900 R12: 0200
[   26.228085] R13: 0400 R14:  R15: 9608bb71b800
[   26.229195] FS:  () GS:9608bec0()
knlGS:
[   26.230509] CS:  0010 DS:  ES:  CR0: 80050033
[   26.231442] CR2: 7fe4bc4ea000 CR3: 39cab000 CR4: 001406f0
[   26.232620] Stack:
[   26.232967]  9608b9d93bd0 9d3f2f1d 9608bb71bd00
01080020
[   26.234269]  9608bfaade60 9608bf162380 
002e
[   26.235558]  04000200  
80a6fe96
[   26.236854] Call Trace:
[   26.237263]  [] ? __sg_alloc_table+0x7d/0x160
[   26.238217]  [] scsi_init_sgtable+0x3d/0x70
[   26.239148]  [] scsi_init_io+0x44/0x1c0
[   26.240013]  [] sd_init_command+0x2b2/0xde0
[   26.240970]  [] ? scsi_host_alloc_command+0x4b/0xc0
[   26.242015]  [] scsi_setup_cmnd+0x101/0x160
[   26.242962]  [] scsi_prep_fn+0xf4/0x180
[   26.243869]  [] blk_peek_request+0x16e/0x2b0
[   26.244836]  [] scsi_request_fn+0x3f/0x5f0
[   26.245756]  [] __blk_run_queue+0x33/0x40
[   26.246636]  [] blk_delay_work+0x25/0x40
[   26.247506]  [] process_one_work+0x184/0x430
[   26.248433]  [] worker_thread+0x4e/0x480
[   26.249311]  [] ? process_one_work+0x430/0x430
[   26.250265]  [] ? process_one_work+0x430/0x430
[   26.251210]  [] kthread+0xd8/0xf0
[   26.251993]  [] ret_from_fork+0x1f/0x40
[   26.252845]  [] ? kthread_worker_fn+0x180/0x180
[   26.253801] Code: c6 41 01 c5 41 29 c0 41 29 c4 44 39 ea 75 c9 41
83 c6 01 45 31 ed eb c0 48 8b 4c 24 10 48 8b 31 83 e6 03 a8 03 0f 84
38 ff ff ff <0f> 0b 48 8b 5c 24 20 4c 89 54 24 30 48 89 df ff 90 c0 00
00 00
[   26.258363] RIP  [] blk_rq_map_sg+0x317/0x560
[   26.259345]  RSP 
[   26.259890] ---[ end trace bb376bf807673a6f ]---
[   26.260678] BUG: unable to handle kernel paging request at 80a6fe96
[   26.261828] IP: [] __wake_up_common+0x2b/0x80
[   26.262785] PGD 0
[   26.263141] Oops:  [#2] SMP
[   26.263644] Modules linked in: vmw_vsock_vmci_transport vsock
sb_edac edac_core intel_powerclamp coretemp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel raid1 intel_rapl_perf ppdev
vmw_balloon pcspkr joydev vmxnet3 acpi_cpufreq tpm_tis tpm_tis_core
tpm vmw_vmci fjes shpchp parport_pc parport i2c_piix4 dm_multipath
vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw vmw_pvscsi
ata_generic pata_acpi
[   26.270080] CPU: 0 PID: 220 Comm: kworker/0:1H Tainted: G  D
 4.8.0-1.vanilla.knurd.1.fc24.x86_64 #1
[   26.271661] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[   26.273349] task: 9608bf30 task.stack: 9608b9d9
[   26.274273] RIP: 0010:[]  []
__wake_up_common+0x2b/0x80
[   26.275621] RSP: 0018:9608b9d93e38  EFLAGS: 00010086
[   26.276454] RAX: 0282 RBX: 9608b9d93f10 RCX: 
[   26.277593] RDX: 80a6fe96 RSI: 0003 RDI: 

Re: BUG and Oops while trying to issue a discard to LVM on RAID1 md

2016-10-05 Thread Sitsofe Wheeler
On 4 October 2016 at 07:20, Sitsofe Wheeler  wrote:
> On 4 October 2016 at 07:17, Sitsofe Wheeler  wrote:
>> While trying to do a discard inside an ESXi 6 VM to an LVM device atop
>> an md RAID1 device composed of two SATA SSDs passed up as a raw disk
>> mappings through a PVSCSI controller, this BUG followed by an Oops was
>> hit:
>>
>> [   86.902888] [ cut here ]
>> [   86.904600] kernel BUG at arch/x86/kernel/pci-nommu.c:66!

On a 4.8.0 kernel the problem seems to have shifted a bit:


-- 
Sitsofe | http://sucs.org/~sits/
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()

2016-10-05 Thread Bart Van Assche

On 10/04/16 21:32, Ming Lei wrote:

On Wed, Oct 5, 2016 at 12:16 PM, Bart Van Assche
 wrote:

On 10/01/16 15:56, Ming Lei wrote:


If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(),
the above code needn't to be duplicated any more.


Can you have a look at the attached patch? That patch uses an srcu read lock
for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has been set.


That is much cleaner now.


Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. Just like
previous versions, this patch has been tested.


I think the flag of QUEUE_FLAG_QUIESCING is still needed because we
have to set this flag to prevent new coming .queue_rq() from being run,
and synchronize_srcu() won't wait for completion of that at all (see
section of 'Update-Side Primitives' in [1]).

[1] https://lwn.net/Articles/202847/


Hello Ming,

How about using the existing flag BLK_MQ_S_STOPPED instead of 
introducing a new QUEUE_FLAG_QUIESCING flag? From the comment above 
blk_mq_quiesce_queue() in the patch that was attached to my previous 
e-mail: "Additionally, it is not prevented that new queue_rq() calls 
occur unless the queue has been stopped first."


Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html