Re: [PATCHv3 0/5] Dynamic growing data area support

2017-03-20 Thread Xiubo Li

On 2017年03月21日 13:24, Nicholas A. Bellinger wrote:

Hi Xiubo !

On Mon, 2017-03-20 at 17:09 +0800, lixi...@cmss.chinamobile.com wrote:

From: Xiubo Li 

Changed for V3:
- [PATCHv2 2/5] fix double usage of blocks and possible page fault
   call trace.
- [PATCHv2 5/5] fix a mistake.

Changed for V2:
- [PATCHv2 1/5] just fixes some small spelling and other mistakes.
   And as the initial patch, here sets cmd area to 8M and data area to
   1G(1M fixed and 1023M growing)
- [PATCHv2 2/5] is a new one, adding global data block pool support.
   The max total size of the pool is 2G and all the targets will get
   growing blocks from here.
   Test this using multi-targets at the same time.
- [PATCHv2 3/5] changed nothing, respin it to avoid the conflict.
- [PATCHv2 4/5] and [PATCHv2 5/5] are new ones.


Xiubo Li (5):
   tcmu: Add dynamic growing data area feature support
   tcmu: Add global data block pool support
   target/user: Fix possible overwrite of t_data_sg's last iov[]
   target/user: Fix wrongly calculating of the base_command_size
   target/user: Clean up tcmu_queue_cmd_ring

  drivers/target/target_core_user.c | 621 +++---
  1 file changed, 514 insertions(+), 107 deletions(-)


I've not been following the details of your TCMU efforts, so will have
defer to MNC + AGrover for Acked-by + Reviewed-bys here.  ;)

One comment on the series ordering though..

Patches #1 + #2 introduce new features, while patches #3 + #4 are
bug-fixes to (existing..?) code.  AFAICT, patches #3 + #4 are
stand-alone that don't depend on the new features.  Is that correct..?


Yes, that's right.


If so, I'd prefer to apply #3 + #4 to target-pending/master for
v4.11-rcX (eg: bug-fixes only), and include new features in #1 + #2 and
cleanup in #5 to target-pending/for-next. (eg: next merge window for
v4.12-rc1).

Usually the preferred way when submitting patches is to always put
bug-fixes first in the series, followed by new features, further
cleanups, etc.

That way it's easy for a maintainer to split out / cherry-pick bug-fixes
from the series as necessary, without needing to worry about
dependencies in the earlier patches.

That said, if patch #3 + #4 are stand-alone bug-fixes, would you be so
kind to re-order them at the head of the series, and re-submit..?


Sure.

I will split the #3, #4 separately.

Thanks,

BRs
Xiubo












Re: [PATCHv3 0/5] Dynamic growing data area support

2017-03-20 Thread Nicholas A. Bellinger
Hi Xiubo !

On Mon, 2017-03-20 at 17:09 +0800, lixi...@cmss.chinamobile.com wrote:
> From: Xiubo Li 
> 
> Changed for V3:
> - [PATCHv2 2/5] fix double usage of blocks and possible page fault
>   call trace.
> - [PATCHv2 5/5] fix a mistake.
> 
> Changed for V2:
> - [PATCHv2 1/5] just fixes some small spelling and other mistakes.
>   And as the initial patch, here sets cmd area to 8M and data area to
>   1G(1M fixed and 1023M growing)
> - [PATCHv2 2/5] is a new one, adding global data block pool support.
>   The max total size of the pool is 2G and all the targets will get
>   growing blocks from here.
>   Test this using multi-targets at the same time.
> - [PATCHv2 3/5] changed nothing, respin it to avoid the conflict.
> - [PATCHv2 4/5] and [PATCHv2 5/5] are new ones.
> 
> 
> Xiubo Li (5):
>   tcmu: Add dynamic growing data area feature support
>   tcmu: Add global data block pool support
>   target/user: Fix possible overwrite of t_data_sg's last iov[]
>   target/user: Fix wrongly calculating of the base_command_size
>   target/user: Clean up tcmu_queue_cmd_ring
> 
>  drivers/target/target_core_user.c | 621 
> +++---
>  1 file changed, 514 insertions(+), 107 deletions(-)
> 

I've not been following the details of your TCMU efforts, so will have
defer to MNC + AGrover for Acked-by + Reviewed-bys here.  ;)

One comment on the series ordering though..

Patches #1 + #2 introduce new features, while patches #3 + #4 are
bug-fixes to (existing..?) code.  AFAICT, patches #3 + #4 are
stand-alone that don't depend on the new features.  Is that correct..?

If so, I'd prefer to apply #3 + #4 to target-pending/master for
v4.11-rcX (eg: bug-fixes only), and include new features in #1 + #2 and
cleanup in #5 to target-pending/for-next. (eg: next merge window for
v4.12-rc1).

Usually the preferred way when submitting patches is to always put
bug-fixes first in the series, followed by new features, further
cleanups, etc.

That way it's easy for a maintainer to split out / cherry-pick bug-fixes
from the series as necessary, without needing to worry about
dependencies in the earlier patches.

That said, if patch #3 + #4 are stand-alone bug-fixes, would you be so
kind to re-order them at the head of the series, and re-submit..?



Re: [PATCH] scsi: sr: fix oob access in get_capabilities

2017-03-20 Thread Kefeng Wang


On 2017/3/20 22:29, Martin K. Petersen wrote:
> Kefeng Wang  writes:
> 
> Kefeng,
> 
>> The issue still exists, the patch return zero in scsi_mode_sense(), but zero 
>> means
>> SAM_STAT_GOOD in scsi_status_is_good(), so n will be still bigger than 512;
> 
> OK, I checked the other users of scsi_mode_sense(). So let's keep this
> fix local to sr.c for now.
> 
> How about the following?
> 
> 
> scsi: sr: Sanity check returned mode data
> 
> Kefeng Wang discovered that old versions of the QEMU CD driver would
> return mangled mode data causing us to walk off the end of the buffer in
> an attempt to parse it. Sanity check the returned mode sense data.
> 
> Cc: 
> Reported-by: Kefeng Wang 
> Signed-off-by: Martin K. Petersen 
> 
> diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
> index 0b29b9329b1c..a8f630213a1a 100644
> --- a/drivers/scsi/sr.c
> +++ b/drivers/scsi/sr.c
> @@ -836,6 +836,7 @@ static void get_capabilities(struct scsi_cd *cd)
>   unsigned char *buffer;
>   struct scsi_mode_data data;
>   struct scsi_sense_hdr sshdr;
> + unsigned int ms_len = 128;
>   int rc, n;
>  
>   static const char *loadmech[] =
> @@ -862,10 +863,11 @@ static void get_capabilities(struct scsi_cd *cd)
>   scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, );
>  
>   /* ask for mode page 0x2a */
> - rc = scsi_mode_sense(cd->device, 0, 0x2a, buffer, 128,
> + rc = scsi_mode_sense(cd->device, 0, 0x2a, buffer, ms_len,
>SR_TIMEOUT, 3, , NULL);
>  

move n = data.header_length + data.block_descriptor_length; here,

> - if (!scsi_status_is_good(rc)) {
> + if (!scsi_status_is_good(rc) || data.length > ms_len ||
> + data.header_length + data.block_descriptor_length > data.length) {

n > data.length

Tested-by: Kefeng Wang 

Thanks,
Kefeng

>   /* failed, drive doesn't have capabilities mode page */
>   cd->cdi.speed = 1;
>   cd->cdi.mask |= (CDC_CD_R | CDC_CD_RW | CDC_DVD_R |
> 
> 
> 
> .
> 



[PATCH 7/7] sd: use ZERO_PAGE for WRITE_SAME payloads

2017-03-20 Thread Christoph Hellwig
We're never touching the contents of the page, so save a memory
allocation for these cases.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/sd.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c18fe9ff1f8f..af632e350ab4 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -756,7 +756,7 @@ static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd)
u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);
u32 data_len = sdp->sector_size;
 
-   rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
+   rq->special_vec.bv_page = ZERO_PAGE(0);
if (!rq->special_vec.bv_page)
return BLKPREP_DEFER;
rq->special_vec.bv_offset = 0;
@@ -785,7 +785,7 @@ static int sd_setup_write_same10_cmnd(struct scsi_cmnd 
*cmd, bool unmap)
u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);
u32 data_len = sdp->sector_size;
 
-   rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
+   rq->special_vec.bv_page = ZERO_PAGE(0);
if (!rq->special_vec.bv_page)
return BLKPREP_DEFER;
rq->special_vec.bv_offset = 0;
@@ -1256,7 +1256,8 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt)
 {
struct request *rq = SCpnt->request;
 
-   if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
+   if ((rq->rq_flags & RQF_SPECIAL_PAYLOAD) &&
+   rq->special_vec.bv_page != ZERO_PAGE(0))
__free_page(rq->special_vec.bv_page);
 
if (SCpnt->cmnd != scsi_req(rq)->cmd) {
-- 
2.11.0



[PATCH 4/7] libata: simplify the trim implementation

2017-03-20 Thread Christoph Hellwig
Don't try to fake up basic SCSI logical block provisioning and WRITE SAME
support, but offer support for the Linux Vendor Specific TRIM command
instead.  This simplifies the implementation a lot, and avoids rewriting
the data out buffer in the I/O path.   Note that this new command is only
offered to the block layer and will fail for pass through commands.
While this is theoretically a regression in the functionality offered
through SG_IO the previous support was buggy and corrupted user memory
by rewriting the data out buffer in place.

Last but not least this removes the global ata_scsi_rbuf_lock from
the trim path.

Signed-off-by: Christoph Hellwig 
---
 drivers/ata/libata-scsi.c | 179 --
 1 file changed, 28 insertions(+), 151 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index b93d7e33789a..965b9e7dbb7d 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1322,6 +1322,16 @@ static int ata_scsi_dev_config(struct scsi_device *sdev,
 
blk_queue_flush_queueable(q, false);
 
+   if (ata_id_has_trim(dev->id) &&
+   !(dev->horkage & ATA_HORKAGE_NOTRIM)) {
+   sdev->ata_trim = 1;
+   if (ata_id_has_zero_after_trim(dev->id) &&
+   (dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM)) {
+   ata_dev_info(dev, "Enabling discard_zeroes_data\n");
+   sdev->ata_trim_zeroes_data = 1;
+   }
+   }
+
dev->sdev = sdev;
return 0;
 }
@@ -2383,21 +2393,6 @@ static unsigned int ata_scsiop_inq_b0(struct 
ata_scsi_args *args, u8 *rbuf)
 */
min_io_sectors = 1 << ata_id_log2_per_physical_sector(args->id);
put_unaligned_be16(min_io_sectors, [6]);
-
-   /*
-* Optimal unmap granularity.
-*
-* The ATA spec doesn't even know about a granularity or alignment
-* for the TRIM command.  We can leave away most of the unmap related
-* VPD page entries, but we have specifify a granularity to signal
-* that we support some form of unmap - in thise case via WRITE SAME
-* with the unmap bit set.
-*/
-   if (ata_id_has_trim(args->id)) {
-   put_unaligned_be64(65535 * ATA_MAX_TRIM_RNUM, [36]);
-   put_unaligned_be32(1, [28]);
-   }
-
return 0;
 }
 
@@ -2746,16 +2741,6 @@ static unsigned int ata_scsiop_read_cap(struct 
ata_scsi_args *args, u8 *rbuf)
rbuf[14] = (lowest_aligned >> 8) & 0x3f;
rbuf[15] = lowest_aligned;
 
-   if (ata_id_has_trim(args->id) &&
-   !(dev->horkage & ATA_HORKAGE_NOTRIM)) {
-   rbuf[14] |= 0x80; /* LBPME */
-
-   if (ata_id_has_zero_after_trim(args->id) &&
-   dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) {
-   ata_dev_info(dev, "Enabling 
discard_zeroes_data\n");
-   rbuf[14] |= 0x40; /* LBPRZ */
-   }
-   }
if (ata_id_zoned_cap(args->id) ||
args->dev->class == ATA_DEV_ZAC)
rbuf[12] = (1 << 4); /* RC_BASIS */
@@ -3339,141 +3324,45 @@ static unsigned int ata_scsi_pass_thru(struct 
ata_queued_cmd *qc)
 }
 
 /**
- * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim
- * @cmd: SCSI command being translated
- * @trmax: Maximum number of entries that will fit in sector_size bytes.
- * @sector: Starting sector
- * @count: Total Range of request in logical sectors
- *
- * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted
- * descriptor.
- *
- * Upto 64 entries of the format:
- *   63:48 Range Length
- *   47:0  LBA
- *
- *  Range Length of 0 is ignored.
- *  LBA's should be sorted order and not overlap.
- *
- * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET
- *
- * Return: Number of bytes copied into sglist.
- */
-static size_t ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 trmax,
-   u64 sector, u32 count)
-{
-   struct scsi_device *sdp = cmd->device;
-   size_t len = sdp->sector_size;
-   size_t r;
-   __le64 *buf;
-   u32 i = 0;
-   unsigned long flags;
-
-   WARN_ON(len > ATA_SCSI_RBUF_SIZE);
-
-   if (len > ATA_SCSI_RBUF_SIZE)
-   len = ATA_SCSI_RBUF_SIZE;
-
-   spin_lock_irqsave(_scsi_rbuf_lock, flags);
-   buf = ((void *)ata_scsi_rbuf);
-   memset(buf, 0, len);
-   while (i < trmax) {
-   u64 entry = sector |
-   ((u64)(count > 0x ? 0x : count) << 48);
-   buf[i++] = __cpu_to_le64(entry);
-   if (count <= 0x)
-   break;
-   count -= 0x;
-   sector += 0x;
-   }
-   r = sg_copy_from_buffer(scsi_sglist(cmd), 

[PATCH 5/7] block: add a max_discard_segment_size queue limit

2017-03-20 Thread Christoph Hellwig
ATA only allows 16 bits, so we need a limit.

Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c   |  6 ++
 block/blk-merge.c  |  9 +
 block/blk-settings.c   | 14 ++
 include/linux/blkdev.h |  8 
 4 files changed, 37 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index d772c221cc17..3eb3bd89b47a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1486,9 +1486,15 @@ bool bio_attempt_discard_merge(struct request_queue *q, 
struct request *req,
struct bio *bio)
 {
unsigned short segments = blk_rq_nr_discard_segments(req);
+   unsigned max_segment_sectors = queue_max_discard_segment_size(q) >> 9;
 
if (segments >= queue_max_discard_segments(q))
goto no_merge;
+   if (blk_rq_sectors(req) > max_segment_sectors)
+   goto no_merge;
+   if (bio_sectors(bio) > max_segment_sectors)
+   goto no_merge;
+
if (blk_rq_sectors(req) + bio_sectors(bio) >
blk_rq_get_max_sectors(req, blk_rq_pos(req)))
goto no_merge;
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2afa262425d1..c62a6f0325e0 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -11,6 +11,15 @@
 
 #include "blk.h"
 
+/*
+ * Split a discard bio if it doesn't fit into the overall discard request size
+ * of the device.  Note that we don't split it here if it's over the maximum
+ * discard segment size to avoid creating way too many bios in that case.
+ * We will simply take care of never merging such a larger than segment size
+ * bio into a request that has other bios, and let the low-level driver take
+ * care of splitting the request into multiple ranges in the on the wire
+ * format.
+ */
 static struct bio *blk_bio_discard_split(struct request_queue *q,
 struct bio *bio,
 struct bio_set *bs,
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 1e7174ffc9d4..9d515ae3a405 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -93,6 +93,7 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
lim->virt_boundary_mask = 0;
lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
+   lim->max_discard_segment_size = UINT_MAX;
lim->max_sectors = lim->max_hw_sectors = BLK_SAFE_MAX_SECTORS;
lim->max_dev_sectors = 0;
lim->chunk_sectors = 0;
@@ -132,6 +133,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
lim->max_discard_segments = 1;
lim->max_hw_sectors = UINT_MAX;
lim->max_segment_size = UINT_MAX;
+   lim->max_discard_segment_size = UINT_MAX;
lim->max_sectors = UINT_MAX;
lim->max_dev_sectors = UINT_MAX;
lim->max_write_same_sectors = UINT_MAX;
@@ -376,6 +378,18 @@ void blk_queue_max_segment_size(struct request_queue *q, 
unsigned int max_size)
 EXPORT_SYMBOL(blk_queue_max_segment_size);
 
 /**
+ * blk_queue_max_discard_segment_size - set max segment size for discards
+ * @q:  the request queue for the device
+ * @max_size:  max size of a discard segment in bytes
+ **/
+void blk_queue_max_discard_segment_size(struct request_queue *q,
+   unsigned int max_size)
+{
+   q->limits.max_discard_segment_size = max_size;
+}
+EXPORT_SYMBOL_GPL(blk_queue_max_discard_segment_size);
+
+/**
  * blk_queue_logical_block_size - set logical block size for the queue
  * @q:  the request queue for the device
  * @size:  the logical block size, in bytes
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5a7da607ca04..3b3bd646f580 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -333,6 +333,7 @@ struct queue_limits {
unsigned short  max_segments;
unsigned short  max_integrity_segments;
unsigned short  max_discard_segments;
+   unsigned intmax_discard_segment_size;
 
unsigned char   misaligned;
unsigned char   discard_misaligned;
@@ -1150,6 +1151,8 @@ extern void blk_queue_max_segments(struct request_queue 
*, unsigned short);
 extern void blk_queue_max_discard_segments(struct request_queue *,
unsigned short);
 extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
+extern void blk_queue_max_discard_segment_size(struct request_queue *,
+   unsigned int);
 extern void blk_queue_max_discard_sectors(struct request_queue *q,
unsigned int max_discard_sectors);
 extern void blk_queue_max_write_same_sectors(struct request_queue *q,
@@ -1415,6 +1418,11 @@ static inline unsigned int queue_max_segment_size(struct 
request_queue *q)
return q->limits.max_segment_size;
 }
 
+static inline unsigned int queue_max_discard_segment_size(struct request_queue 
*q)
+{
+   return q->limits.max_discard_segment_size;
+}
+
 static inline 

support ranges TRIM for libata

2017-03-20 Thread Christoph Hellwig
This series implements rangeѕ discard for ATA SSDs.  Compared to the
initial NVMe support there are two things that complicate the ATA
support:

 - ATA only suports 16-bit long ranges
 - the whole mess of generating a SCSI command first and then
   translating it to an ATA one.

This series adds support for limited range size to the block layer,
and stops translating discard commands - instead we add a new
Vendor Specific SCSI command that contains the TRIM payload when
the device asks for it.


[PATCH 1/7] ѕd: split sd_setup_discard_cmnd

2017-03-20 Thread Christoph Hellwig
Split sd_setup_discard_cmnd into one function per provisioning type.  While
this creates some very slight duplication of boilerplate code it keeps the
code modular for additions of new provisioning types, and for reusing the
write same functions for the upcoming scsi implementation of the Write Zeroes
operation.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/sd.c | 153 ++
 1 file changed, 84 insertions(+), 69 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fcfeddc79331..b853f91fb3da 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -701,93 +701,97 @@ static void sd_config_discard(struct scsi_disk *sdkp, 
unsigned int mode)
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
 }
 
-/**
- * sd_setup_discard_cmnd - unmap blocks on thinly provisioned device
- * @sdp: scsi device to operate on
- * @rq: Request to prepare
- *
- * Will issue either UNMAP or WRITE SAME(16) depending on preference
- * indicated by target device.
- **/
-static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
+static int sd_setup_unmap_cmnd(struct scsi_cmnd *cmd)
 {
-   struct request *rq = cmd->request;
struct scsi_device *sdp = cmd->device;
-   struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
-   sector_t sector = blk_rq_pos(rq);
-   unsigned int nr_sectors = blk_rq_sectors(rq);
-   unsigned int len;
-   int ret;
+   struct request *rq = cmd->request;
+   u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9);
+   u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);
+   unsigned int data_len = 24;
char *buf;
-   struct page *page;
 
-   sector >>= ilog2(sdp->sector_size) - 9;
-   nr_sectors >>= ilog2(sdp->sector_size) - 9;
-
-   page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
-   if (!page)
+   rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
+   if (!rq->special_vec.bv_page)
return BLKPREP_DEFER;
+   rq->special_vec.bv_offset = 0;
+   rq->special_vec.bv_len = data_len;
+   rq->rq_flags |= RQF_SPECIAL_PAYLOAD;
 
-   switch (sdkp->provisioning_mode) {
-   case SD_LBP_UNMAP:
-   buf = page_address(page);
-
-   cmd->cmd_len = 10;
-   cmd->cmnd[0] = UNMAP;
-   cmd->cmnd[8] = 24;
-
-   put_unaligned_be16(6 + 16, [0]);
-   put_unaligned_be16(16, [2]);
-   put_unaligned_be64(sector, [8]);
-   put_unaligned_be32(nr_sectors, [16]);
+   cmd->cmd_len = 10;
+   cmd->cmnd[0] = UNMAP;
+   cmd->cmnd[8] = 24;
 
-   len = 24;
-   break;
+   buf = page_address(rq->special_vec.bv_page);
+   put_unaligned_be16(6 + 16, [0]);
+   put_unaligned_be16(16, [2]);
+   put_unaligned_be64(sector, [8]);
+   put_unaligned_be32(nr_sectors, [16]);
 
-   case SD_LBP_WS16:
-   cmd->cmd_len = 16;
-   cmd->cmnd[0] = WRITE_SAME_16;
-   cmd->cmnd[1] = 0x8; /* UNMAP */
-   put_unaligned_be64(sector, >cmnd[2]);
-   put_unaligned_be32(nr_sectors, >cmnd[10]);
+   cmd->allowed = SD_MAX_RETRIES;
+   cmd->transfersize = data_len;
+   rq->timeout = SD_TIMEOUT;
+   scsi_req(rq)->resid_len = data_len;
 
-   len = sdkp->device->sector_size;
-   break;
+   return scsi_init_io(cmd);
+}
 
-   case SD_LBP_WS10:
-   case SD_LBP_ZERO:
-   cmd->cmd_len = 10;
-   cmd->cmnd[0] = WRITE_SAME;
-   if (sdkp->provisioning_mode == SD_LBP_WS10)
-   cmd->cmnd[1] = 0x8; /* UNMAP */
-   put_unaligned_be32(sector, >cmnd[2]);
-   put_unaligned_be16(nr_sectors, >cmnd[7]);
+static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd)
+{
+   struct scsi_device *sdp = cmd->device;
+   struct request *rq = cmd->request;
+   u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9);
+   u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);
+   u32 data_len = sdp->sector_size;
 
-   len = sdkp->device->sector_size;
-   break;
+   rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
+   if (!rq->special_vec.bv_page)
+   return BLKPREP_DEFER;
+   rq->special_vec.bv_offset = 0;
+   rq->special_vec.bv_len = data_len;
+   rq->rq_flags |= RQF_SPECIAL_PAYLOAD;
 
-   default:
-   ret = BLKPREP_INVALID;
-   goto out;
-   }
+   cmd->cmd_len = 16;
+   cmd->cmnd[0] = WRITE_SAME_16;
+   cmd->cmnd[1] = 0x8; /* UNMAP */
+   put_unaligned_be64(sector, >cmnd[2]);
+   put_unaligned_be32(nr_sectors, >cmnd[10]);
 
+   cmd->allowed = SD_MAX_RETRIES;
+   cmd->transfersize = data_len;
rq->timeout = SD_TIMEOUT;
+   scsi_req(rq)->resid_len = data_len;
 
-   

[PATCH 2/7] sd: provide a new ata trim provisioning mode

2017-03-20 Thread Christoph Hellwig
This uses a vendor specific command to pass the ATA TRIM payload to
libata without having to rewrite it in place.  Support for it is
indicated by a new flag in struct scsi_device that libata will set
in it's slave_configure routine.  A second flag indicates if TRIM
will reliably zero data.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/sd.c  | 60 +++---
 drivers/scsi/sd.h  |  1 +
 include/scsi/scsi_device.h |  2 ++
 include/scsi/scsi_proto.h  |  3 +++
 4 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index b853f91fb3da..cc8684818305 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -371,6 +371,7 @@ static const char *lbp_mode[] = {
[SD_LBP_WS16]   = "writesame_16",
[SD_LBP_WS10]   = "writesame_10",
[SD_LBP_ZERO]   = "writesame_zero",
+   [SD_LBP_ATA_TRIM]   = "ata_trim",
[SD_LBP_DISABLE]= "disabled",
 };
 
@@ -411,7 +412,7 @@ provisioning_mode_store(struct device *dev, struct 
device_attribute *attr,
sd_config_discard(sdkp, SD_LBP_ZERO);
else if (!strncmp(buf, lbp_mode[SD_LBP_DISABLE], 20))
sd_config_discard(sdkp, SD_LBP_DISABLE);
-   else
+   else /* we don't allow manual setting of SD_LBP_ATA_TRIM */
return -EINVAL;
 
return count;
@@ -653,7 +654,7 @@ static void sd_config_discard(struct scsi_disk *sdkp, 
unsigned int mode)
 * lead to data corruption. If LBPRZ is not set, we honor the
 * device preference.
 */
-   if (sdkp->lbprz) {
+   if (sdkp->lbprz || sdkp->device->ata_trim) {
q->limits.discard_alignment = 0;
q->limits.discard_granularity = logical_block_size;
} else {
@@ -695,6 +696,12 @@ static void sd_config_discard(struct scsi_disk *sdkp, 
unsigned int mode)
  (u32)SD_MAX_WS10_BLOCKS);
q->limits.discard_zeroes_data = 1;
break;
+
+   case SD_LBP_ATA_TRIM:
+   max_blocks = 65535 * (512 / sizeof(__le64));
+   if (sdkp->device->ata_trim_zeroes_data)
+   q->limits.discard_zeroes_data = 1;
+   break;
}
 
blk_queue_max_discard_sectors(q, max_blocks * (logical_block_size >> 
9));
@@ -794,6 +801,49 @@ static int sd_setup_write_same10_cmnd(struct scsi_cmnd 
*cmd, bool unmap)
return scsi_init_io(cmd);
 }
 
+static int sd_setup_ata_trim_cmnd(struct scsi_cmnd *cmd)
+{
+   struct scsi_device *sdp = cmd->device;
+   struct request *rq = cmd->request;
+   u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9);
+   u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);
+   u32 data_len = sdp->sector_size, i;
+   __le64 *buf;
+
+   rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
+   if (!rq->special_vec.bv_page)
+   return BLKPREP_DEFER;
+   rq->special_vec.bv_offset = 0;
+   rq->special_vec.bv_len = data_len;
+   rq->rq_flags |= RQF_SPECIAL_PAYLOAD;
+
+   /*
+* Use the Linux Vendor Specific TRIM command to pass the TRIM payload
+* to libata.
+*/
+   cmd->cmd_len = 10;
+   cmd->cmnd[0] = LINUX_VS_TRIM;
+   cmd->cmnd[8] = data_len;
+
+   buf = page_address(rq->special_vec.bv_page);
+   for (i = 0; i < (data_len >> 3); i++) {
+   u64 n = min(nr_sectors, 0xu);
+
+   buf[i] = cpu_to_le64(sector | (n << 48));
+   if (nr_sectors <= 0x)
+   break;
+   sector += 0x;
+   nr_sectors -= 0x;
+   }
+
+   cmd->allowed = SD_MAX_RETRIES;
+   cmd->transfersize = data_len;
+   rq->timeout = SD_TIMEOUT;
+   scsi_req(rq)->resid_len = data_len;
+
+   return scsi_init_io(cmd);
+}
+
 static void sd_config_write_same(struct scsi_disk *sdkp)
 {
struct request_queue *q = sdkp->disk->queue;
@@ -1168,6 +1218,8 @@ static int sd_init_command(struct scsi_cmnd *cmd)
return sd_setup_write_same10_cmnd(cmd, true);
case SD_LBP_ZERO:
return sd_setup_write_same10_cmnd(cmd, false);
+   case SD_LBP_ATA_TRIM:
+   return sd_setup_ata_trim_cmnd(cmd);
default:
return BLKPREP_INVALID;
}
@@ -2739,7 +2791,9 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
sdkp->max_xfer_blocks = get_unaligned_be32([8]);
sdkp->opt_xfer_blocks = get_unaligned_be32([12]);
 
-   if (buffer[3] == 0x3c) {
+   if (sdkp->device->ata_trim) {
+   sd_config_discard(sdkp, SD_LBP_ATA_TRIM);
+   } else if (buffer[3] == 0x3c) {
unsigned int lba_count, desc_count;
 
sdkp->max_ws_blocks = 

[PATCH 6/7] sd: support multi-range TRIM for ATA disks

2017-03-20 Thread Christoph Hellwig
Almost the same scheme as the older multi-range support for NVMe.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/sd.c | 37 +
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index cc8684818305..c18fe9ff1f8f 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -643,7 +643,7 @@ static void sd_config_discard(struct scsi_disk *sdkp, 
unsigned int mode)
 {
struct request_queue *q = sdkp->disk->queue;
unsigned int logical_block_size = sdkp->device->sector_size;
-   unsigned int max_blocks = 0;
+   unsigned int max_blocks = 0, max_ranges = 0, max_range_size = 0;
 
q->limits.discard_zeroes_data = 0;
 
@@ -698,13 +698,19 @@ static void sd_config_discard(struct scsi_disk *sdkp, 
unsigned int mode)
break;
 
case SD_LBP_ATA_TRIM:
-   max_blocks = 65535 * (512 / sizeof(__le64));
+   max_ranges = 512 / sizeof(__le64);
+   max_range_size = USHRT_MAX;
+   max_blocks = max_ranges * max_range_size;
if (sdkp->device->ata_trim_zeroes_data)
q->limits.discard_zeroes_data = 1;
break;
}
 
blk_queue_max_discard_sectors(q, max_blocks * (logical_block_size >> 
9));
+   if (max_ranges)
+   blk_queue_max_discard_segments(q, max_ranges);
+   if (max_range_size)
+   blk_queue_max_discard_segment_size(q, max_range_size);
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
 }
 
@@ -805,9 +811,9 @@ static int sd_setup_ata_trim_cmnd(struct scsi_cmnd *cmd)
 {
struct scsi_device *sdp = cmd->device;
struct request *rq = cmd->request;
-   u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9);
-   u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);
-   u32 data_len = sdp->sector_size, i;
+   u32 sector_shift = ilog2(sdp->sector_size);
+   u32 data_len = sdp->sector_size, i = 0;
+   struct bio *bio;
__le64 *buf;
 
rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
@@ -826,14 +832,21 @@ static int sd_setup_ata_trim_cmnd(struct scsi_cmnd *cmd)
cmd->cmnd[8] = data_len;
 
buf = page_address(rq->special_vec.bv_page);
-   for (i = 0; i < (data_len >> 3); i++) {
-   u64 n = min(nr_sectors, 0xu);
+   __rq_for_each_bio(bio, rq) {
+   u64 sector = bio->bi_iter.bi_sector >> (sector_shift - 9);
+   u32 nr_sectors = bio->bi_iter.bi_size >> sector_shift;
 
-   buf[i] = cpu_to_le64(sector | (n << 48));
-   if (nr_sectors <= 0x)
-   break;
-   sector += 0x;
-   nr_sectors -= 0x;
+   do {
+   u64 n = min(nr_sectors, 0xu);
+
+   buf[i] = cpu_to_le64(sector | (n << 48));
+   if (nr_sectors <= 0x)
+   break;
+   sector += 0x;
+   nr_sectors -= 0x;
+   i++;
+
+   } while (!WARN_ON_ONCE(i >= data_len >> 3));
}
 
cmd->allowed = SD_MAX_RETRIES;
-- 
2.11.0



[PATCH 3/7] libata: remove SCT WRITE SAME support

2017-03-20 Thread Christoph Hellwig
This was already disabled a while ago because it caused I/O errors,
and it's severly getting into the way of the discard / write zeroes
rework.

Signed-off-by: Christoph Hellwig 
---
 drivers/ata/libata-scsi.c | 128 +++---
 1 file changed, 29 insertions(+), 99 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 1ac70744ae7b..b93d7e33789a 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3393,46 +3393,6 @@ static size_t ata_format_dsm_trim_descr(struct scsi_cmnd 
*cmd, u32 trmax,
 }
 
 /**
- * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
- * @cmd: SCSI command being translated
- * @lba: Starting sector
- * @num: Number of sectors to be zero'd.
- *
- * Rewrite the WRITE SAME payload to be an SCT Write Same formatted
- * descriptor.
- * NOTE: Writes a pattern (0's) in the foreground.
- *
- * Return: Number of bytes copied into sglist.
- */
-static size_t ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 
num)
-{
-   struct scsi_device *sdp = cmd->device;
-   size_t len = sdp->sector_size;
-   size_t r;
-   u16 *buf;
-   unsigned long flags;
-
-   spin_lock_irqsave(_scsi_rbuf_lock, flags);
-   buf = ((void *)ata_scsi_rbuf);
-
-   put_unaligned_le16(0x0002,  [0]); /* SCT_ACT_WRITE_SAME */
-   put_unaligned_le16(0x0101,  [1]); /* WRITE PTRN FG */
-   put_unaligned_le64(lba, [2]);
-   put_unaligned_le64(num, [6]);
-   put_unaligned_le32(0u,  [10]); /* pattern */
-
-   WARN_ON(len > ATA_SCSI_RBUF_SIZE);
-
-   if (len > ATA_SCSI_RBUF_SIZE)
-   len = ATA_SCSI_RBUF_SIZE;
-
-   r = sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, len);
-   spin_unlock_irqrestore(_scsi_rbuf_lock, flags);
-
-   return r;
-}
-
-/**
  * ata_scsi_write_same_xlat() - SATL Write Same to ATA SCT Write Same
  * @qc: Command to be translated
  *
@@ -3468,26 +3428,17 @@ static unsigned int ata_scsi_write_same_xlat(struct 
ata_queued_cmd *qc)
}
scsi_16_lba_len(cdb, , _block);
 
-   if (unmap) {
-   /* If trim is not enabled the cmd is invalid. */
-   if ((dev->horkage & ATA_HORKAGE_NOTRIM) ||
-   !ata_id_has_trim(dev->id)) {
-   fp = 1;
-   bp = 3;
-   goto invalid_fld;
-   }
-   /* If the request is too large the cmd is invalid */
-   if (n_block > 0x * trmax) {
-   fp = 2;
-   goto invalid_fld;
-   }
-   } else {
-   /* If write same is not available the cmd is invalid */
-   if (!ata_id_sct_write_same(dev->id)) {
-   fp = 1;
-   bp = 3;
-   goto invalid_fld;
-   }
+   if (!unmap ||
+   (dev->horkage & ATA_HORKAGE_NOTRIM) ||
+   !ata_id_has_trim(dev->id)) {
+   fp = 1;
+   bp = 3;
+   goto invalid_fld;
+   }
+   /* If the request is too large the cmd is invalid */
+   if (n_block > 0x * trmax) {
+   fp = 2;
+   goto invalid_fld;
}
 
/*
@@ -3502,49 +3453,28 @@ static unsigned int ata_scsi_write_same_xlat(struct 
ata_queued_cmd *qc)
 * For DATA SET MANAGEMENT TRIM in ACS-2 nsect (aka count)
 * is defined as number of 512 byte blocks to be transferred.
 */
-   if (unmap) {
-   size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
-   if (size != len)
-   goto invalid_param_len;
 
-   if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
-   /* Newer devices support queued TRIM commands */
-   tf->protocol = ATA_PROT_NCQ;
-   tf->command = ATA_CMD_FPDMA_SEND;
-   tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
-   tf->nsect = qc->tag << 3;
-   tf->hob_feature = (size / 512) >> 8;
-   tf->feature = size / 512;
+   size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
+   if (size != len)
+   goto invalid_param_len;
 
-   tf->auxiliary = 1;
-   } else {
-   tf->protocol = ATA_PROT_DMA;
-   tf->hob_feature = 0;
-   tf->feature = ATA_DSM_TRIM;
-   tf->hob_nsect = (size / 512) >> 8;
-   tf->nsect = size / 512;
-   tf->command = ATA_CMD_DSM;
-   }
-   } else {
-   size = ata_format_sct_write_same(scmd, block, n_block);
-   if (size != len)
-   goto invalid_param_len;
+   if (ata_ncq_enabled(dev) && 

RE: [PATCH] hpsa: fix volume offline state

2017-03-20 Thread Don Brace
> -Original Message-
> From: Tomas Henzl [mailto:the...@redhat.com]
> Sent: Monday, March 20, 2017 10:43 AM
> To: linux-scsi@vger.kernel.org
> Cc: Don Brace ; joseph.szczy...@hpe.com
> Subject: [PATCH] hpsa: fix volume offline state
> 
> EXTERNAL EMAIL
> 
> 
> In a previous patch a hpsa_scsi_dev_t.volume_offline update line
> has been removed, so let us put it back..
> 
> Fixes: 85b29008d8 (hpsa: update check for logical volume status)
> 
> Signed-off-by: Tomas Henzl 
> ---
>  drivers/scsi/hpsa.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 0d0be7754a..9d659aaace 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -3885,6 +3885,7 @@ static int hpsa_update_device_info(struct
> ctlr_info *h,
> if (h->fw_support & MISC_FW_RAID_OFFLOAD_BASIC)
> hpsa_get_ioaccel_status(h, scsi3addr, this_device);
> volume_offline = hpsa_volume_offline(h, scsi3addr);
> +   this_device->volume_offline = volume_offline;
> if (volume_offline == HPSA_LV_FAILED) {
> rc = HPSA_LV_FAILED;
> dev_err(>pdev->dev,
> --
> 2.7.4

Acked-by: Don Brace 

Thanks Tomas.


[PATCH] hpsa: fix volume offline state

2017-03-20 Thread Tomas Henzl
In a previous patch a hpsa_scsi_dev_t.volume_offline update line
has been removed, so let us put it back..

Fixes: 85b29008d8 (hpsa: update check for logical volume status)

Signed-off-by: Tomas Henzl 
---
 drivers/scsi/hpsa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 0d0be7754a..9d659aaace 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -3885,6 +3885,7 @@ static int hpsa_update_device_info(struct ctlr_info *h,
if (h->fw_support & MISC_FW_RAID_OFFLOAD_BASIC)
hpsa_get_ioaccel_status(h, scsi3addr, this_device);
volume_offline = hpsa_volume_offline(h, scsi3addr);
+   this_device->volume_offline = volume_offline;
if (volume_offline == HPSA_LV_FAILED) {
rc = HPSA_LV_FAILED;
dev_err(>pdev->dev,
-- 
2.7.4



Re: [PATCH] scsi: sr: fix oob access in get_capabilities

2017-03-20 Thread Martin K. Petersen
Kefeng Wang  writes:

Kefeng,

> The issue still exists, the patch return zero in scsi_mode_sense(), but zero 
> means
> SAM_STAT_GOOD in scsi_status_is_good(), so n will be still bigger than 512;

OK, I checked the other users of scsi_mode_sense(). So let's keep this
fix local to sr.c for now.

How about the following?


scsi: sr: Sanity check returned mode data

Kefeng Wang discovered that old versions of the QEMU CD driver would
return mangled mode data causing us to walk off the end of the buffer in
an attempt to parse it. Sanity check the returned mode sense data.

Cc: 
Reported-by: Kefeng Wang 
Signed-off-by: Martin K. Petersen 

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 0b29b9329b1c..a8f630213a1a 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -836,6 +836,7 @@ static void get_capabilities(struct scsi_cd *cd)
unsigned char *buffer;
struct scsi_mode_data data;
struct scsi_sense_hdr sshdr;
+   unsigned int ms_len = 128;
int rc, n;
 
static const char *loadmech[] =
@@ -862,10 +863,11 @@ static void get_capabilities(struct scsi_cd *cd)
scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, );
 
/* ask for mode page 0x2a */
-   rc = scsi_mode_sense(cd->device, 0, 0x2a, buffer, 128,
+   rc = scsi_mode_sense(cd->device, 0, 0x2a, buffer, ms_len,
 SR_TIMEOUT, 3, , NULL);
 
-   if (!scsi_status_is_good(rc)) {
+   if (!scsi_status_is_good(rc) || data.length > ms_len ||
+   data.header_length + data.block_descriptor_length > data.length) {
/* failed, drive doesn't have capabilities mode page */
cd->cdi.speed = 1;
cd->cdi.mask |= (CDC_CD_R | CDC_CD_RW | CDC_DVD_R |




Re: [PATCH] scsi: libsas: fix ata xfer length

2017-03-20 Thread Martin K. Petersen
John Garry  writes:

> I should have added this originally to the changelog:
> Fixes: ff2aeb1eb64c8a4770a6 ("libata: convert to chained sg")

Added.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] qedf: fix wrong le16 conversion

2017-03-20 Thread Chad Dupuis

On Mon, 20 Mar 2017, 8:49am -, Arnd Bergmann wrote:

> gcc points out that we are converting a 16-bit integer into a 32-bit
> little-endian type and assigning that to 16-bit little-endian
> will end up with a zero:
> 
> drivers/scsi/qedf/drv_fcoe_fw_funcs.c: In function 
> 'init_initiator_rw_fcoe_task':
> include/uapi/linux/byteorder/big_endian.h:32:26: error: large integer 
> implicitly truncated to unsigned type [-Werror=overflow]
>   t_st_ctx->read_write.rx_id = cpu_to_le32(FCOE_RX_ID);
> 
> The correct solution appears to be to just use a 16-bit byte swap instead.
> 
> Fixes: be086e7c53f1 ("qed*: Utilize Firmware 8.15.3.0")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/scsi/qedf/drv_fcoe_fw_funcs.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/qedf/drv_fcoe_fw_funcs.c 
> b/drivers/scsi/qedf/drv_fcoe_fw_funcs.c
> index bb812db48da6..8c65e3b034dc 100644
> --- a/drivers/scsi/qedf/drv_fcoe_fw_funcs.c
> +++ b/drivers/scsi/qedf/drv_fcoe_fw_funcs.c
> @@ -8,7 +8,7 @@
>  #include "drv_fcoe_fw_funcs.h"
>  #include "drv_scsi_fw_funcs.h"
>  
> -#define FCOE_RX_ID ((u32)0x)
> +#define FCOE_RX_ID (0xu)
>  
>  static inline void init_common_sqe(struct fcoe_task_params *task_params,
>  enum fcoe_sqe_request_type request_type)
> @@ -59,7 +59,7 @@ int init_initiator_rw_fcoe_task(struct fcoe_task_params 
> *task_params,
>   t_st_ctx->read_only.task_type = task_params->task_type;
>   SET_FIELD(t_st_ctx->read_write.flags,
> FCOE_TSTORM_FCOE_TASK_ST_CTX_READ_WRITE_EXP_FIRST_FRAME, 1);
> - t_st_ctx->read_write.rx_id = cpu_to_le32(FCOE_RX_ID);
> + t_st_ctx->read_write.rx_id = cpu_to_le16(FCOE_RX_ID);
>  
>   /* Ustorm ctx */
>   u_ag_ctx = >ustorm_ag_context;
> @@ -151,7 +151,7 @@ int init_initiator_midpath_unsolicited_fcoe_task(
>   t_st_ctx->read_only.task_type = task_params->task_type;
>   SET_FIELD(t_st_ctx->read_write.flags,
> FCOE_TSTORM_FCOE_TASK_ST_CTX_READ_WRITE_EXP_FIRST_FRAME, 1);
> - t_st_ctx->read_write.rx_id = cpu_to_le32(FCOE_RX_ID);
> + t_st_ctx->read_write.rx_id = cpu_to_le16(FCOE_RX_ID);
>  
>   /* Init Ustorm */
>   u_ag_ctx = >ustorm_ag_context;
> 

Arnd, thanks for fixing this up.

Acked-by: Chad Dupuis 


[PATCHv3 5/5] target/user: Clean up tcmu_queue_cmd_ring

2017-03-20 Thread lixiubo
From: Xiubo Li 

Add two helpers to simplify the code, and move some code out of
the cmdr spin lock to reduce the size of critical region.

Signed-off-by: Xiubo Li 
---
 drivers/target/target_core_user.c | 68 +--
 1 file changed, 37 insertions(+), 31 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index e3daf15..52fa988 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -153,7 +153,7 @@ struct tcmu_cmd {
 
/* Can't use se_cmd when cleaning up expired cmds, because if
   cmd has been completed then accessing se_cmd is off limits */
-   uint32_t dbi_len;
+   uint32_t dbi_cnt;
uint32_t dbi_cur;
uint32_t *dbi;
 
@@ -255,7 +255,7 @@ static bool tcmu_get_empty_blocks(struct tcmu_dev *udev,
int i;
 
tcmu_cmd_reset_dbi_cur(tcmu_cmd);
-   for (i = 0; i < tcmu_cmd->dbi_len; i++) {
+   for (i = 0; i < tcmu_cmd->dbi_cnt; i++) {
if (!tcmu_get_empty_block(udev, tcmu_cmd))
goto err;
}
@@ -263,7 +263,7 @@ static bool tcmu_get_empty_blocks(struct tcmu_dev *udev,
 
 err:
pr_debug("no blocks: only %u blocks available, but ask for %u\n",
-   tcmu_cmd->dbi_len, tcmu_cmd->dbi_cur);
+   tcmu_cmd->dbi_cnt, tcmu_cmd->dbi_cur);
tcmu_cmd_free_data(tcmu_cmd, tcmu_cmd->dbi_cur);
udev->waiting_global = true;
return false;
@@ -286,25 +286,29 @@ static inline void tcmu_free_cmd(struct tcmu_cmd 
*tcmu_cmd)
kmem_cache_free(tcmu_cmd_cache, tcmu_cmd);
 }
 
-static inline uint32_t tcmu_cmd_get_dbi_len(struct se_cmd *se_cmd)
+static inline uint32_t tcmu_cmd_get_data_length(struct se_cmd *se_cmd)
 {
size_t data_length = round_up(se_cmd->data_length, DATA_BLOCK_SIZE);
-   uint32_t dbi_len;
 
if (se_cmd->se_cmd_flags & SCF_BIDI)
data_length += round_up(se_cmd->t_bidi_data_sg->length,
DATA_BLOCK_SIZE);
 
-   dbi_len = data_length / DATA_BLOCK_SIZE;
+   return data_length;
+}
+
+static inline uint32_t tcmu_cmd_get_dbi_cnt(struct se_cmd *se_cmd)
+{
+   size_t data_length = tcmu_cmd_get_data_length(se_cmd);
 
-   return dbi_len;
+   return data_length / DATA_BLOCK_SIZE;
 }
 
 static struct tcmu_cmd *tcmu_alloc_cmd(struct se_cmd *se_cmd)
 {
struct se_device *se_dev = se_cmd->se_dev;
struct tcmu_dev *udev = TCMU_DEV(se_dev);
-   uint32_t dbi_len = tcmu_cmd_get_dbi_len(se_cmd);
+   uint32_t dbi_cnt = tcmu_cmd_get_dbi_cnt(se_cmd);
struct tcmu_cmd *tcmu_cmd;
int cmd_id;
 
@@ -319,8 +323,8 @@ static struct tcmu_cmd *tcmu_alloc_cmd(struct se_cmd 
*se_cmd)
msecs_to_jiffies(udev->cmd_time_out);
 
tcmu_cmd_reset_dbi_cur(tcmu_cmd);
-   tcmu_cmd->dbi_len = dbi_len;
-   tcmu_cmd->dbi = kcalloc(dbi_len, sizeof(uint32_t), GFP_KERNEL);
+   tcmu_cmd->dbi_cnt = dbi_cnt;
+   tcmu_cmd->dbi = kcalloc(dbi_cnt, sizeof(uint32_t), GFP_KERNEL);
if (!tcmu_cmd->dbi) {
kmem_cache_free(tcmu_cmd_cache, tcmu_cmd);
return NULL;
@@ -572,13 +576,27 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
struct tcmu_cmd *cmd,
return true;
 }
 
+static inline void tcmu_cmd_get_cmd_size(struct tcmu_cmd *tcmu_cmd,
+   size_t *base_size, size_t *cmd_size)
+{
+   struct se_cmd *se_cmd = tcmu_cmd->se_cmd;
+
+   *base_size = max(offsetof(struct tcmu_cmd_entry,
+req.iov[tcmu_cmd->dbi_cnt]),
+sizeof(struct tcmu_cmd_entry));
+
+   *cmd_size = round_up(scsi_command_size(se_cmd->t_task_cdb),
+   TCMU_OP_ALIGN_SIZE) + *base_size;
+   WARN_ON(*cmd_size & (TCMU_OP_ALIGN_SIZE-1));
+}
+
 static sense_reason_t
 tcmu_queue_cmd_ring(struct tcmu_cmd *tcmu_cmd)
 {
struct tcmu_dev *udev = tcmu_cmd->tcmu_dev;
struct se_cmd *se_cmd = tcmu_cmd->se_cmd;
size_t base_command_size, command_size;
-   struct tcmu_mailbox *mb;
+   struct tcmu_mailbox *mb = udev->mb_addr;
struct tcmu_cmd_entry *entry;
struct iovec *iov;
int iov_cnt, ret;
@@ -590,6 +608,8 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
struct tcmu_cmd *cmd,
if (test_bit(TCMU_DEV_BIT_BROKEN, >flags))
return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
 
+   if (se_cmd->se_cmd_flags & SCF_BIDI)
+   BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents));
/*
 * Must be a certain minimum size for response sense info, but
 * also may be larger if the iov array is large.
@@ -597,33 +617,19 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
struct tcmu_cmd *cmd,
 * We prepare way 

[PATCHv3 3/5] target/user: Fix possible overwrite of t_data_sg's last iov[]

2017-03-20 Thread lixiubo
From: Xiubo Li 

If there has BIDI data, its first iov[] will overwrite the last
iov[] for se_cmd->t_data_sg.

To fix this, we can just increase the iov pointer, but this may
introuduce a new memory leakage bug: If the se_cmd->data_length
and se_cmd->t_bidi_data_sg->length are all not aligned up to the
DATA_BLOCK_SIZE, the actual length needed maybe larger than just
sum of them.

So, this could be avoided by rounding all the data lengthes up
to DATA_BLOCK_SIZE.

Signed-off-by: Xiubo Li 
Signed-off-by: Bryant G. Ly 
---
 drivers/target/target_core_user.c | 32 +++-
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index 8cbe196..780c30f 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -288,13 +288,14 @@ static inline void tcmu_free_cmd(struct tcmu_cmd 
*tcmu_cmd)
 
 static inline uint32_t tcmu_cmd_get_dbi_len(struct se_cmd *se_cmd)
 {
-   size_t data_length = se_cmd->data_length;
+   size_t data_length = round_up(se_cmd->data_length, DATA_BLOCK_SIZE);
uint32_t dbi_len;
 
if (se_cmd->se_cmd_flags & SCF_BIDI)
-   data_length += se_cmd->t_bidi_data_sg->length;
+   data_length += round_up(se_cmd->t_bidi_data_sg->length,
+   DATA_BLOCK_SIZE);
 
-   dbi_len = (data_length + DATA_BLOCK_SIZE - 1) / DATA_BLOCK_SIZE;
+   dbi_len = data_length / DATA_BLOCK_SIZE;
 
return dbi_len;
 }
@@ -609,10 +610,11 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
struct tcmu_cmd *cmd,
 
mb = udev->mb_addr;
cmd_head = mb->cmd_head % udev->cmdr_size; /* UAM */
-   data_length = se_cmd->data_length;
+   data_length = round_up(se_cmd->data_length, DATA_BLOCK_SIZE);
if (se_cmd->se_cmd_flags & SCF_BIDI) {
BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents));
-   data_length += se_cmd->t_bidi_data_sg->length;
+   data_length += round_up(se_cmd->t_bidi_data_sg->length,
+   DATA_BLOCK_SIZE);
}
if ((command_size > (udev->cmdr_size / 2)) ||
data_length > udev->data_size) {
@@ -690,15 +692,19 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
struct tcmu_cmd *cmd,
entry->req.iov_dif_cnt = 0;
 
/* Handle BIDI commands */
-   iov_cnt = 0;
-   ret = alloc_and_scatter_data_area(udev, tcmu_cmd,
-   se_cmd->t_bidi_data_sg, se_cmd->t_bidi_data_nents,
-   , _cnt, false);
-   if (ret) {
-   pr_err("tcmu: alloc and scatter bidi data failed\n");
-   return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
+   if (se_cmd->se_cmd_flags & SCF_BIDI) {
+   iov_cnt = 0;
+   iov++;
+   ret = alloc_and_scatter_data_area(udev, tcmu_cmd,
+   se_cmd->t_bidi_data_sg,
+   se_cmd->t_bidi_data_nents,
+   , _cnt, false);
+   if (ret) {
+   pr_err("tcmu: alloc and scatter bidi data failed\n");
+   return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
+   }
+   entry->req.iov_bidi_cnt = iov_cnt;
}
-   entry->req.iov_bidi_cnt = iov_cnt;
 
/* All offsets relative to mb_addr, not start of entry! */
cdb_off = CMDR_OFF + cmd_head + base_command_size;
-- 
1.8.3.1





[PATCHv3 1/5] tcmu: Add dynamic growing data area feature support

2017-03-20 Thread lixiubo
From: Xiubo Li 

Currently for the TCMU, the ring buffer size is fixed to 64K cmd
area + 1M data area, and this will be bottlenecks for high iops.

The struct tcmu_cmd_entry {} size is fixed about 112 bytes with
iovec[N] & N <= 4, and the size of struct iovec is about 16 bytes.

If N == 0, the ratio will be sizeof(cmd entry) : sizeof(datas) ==
112Bytes : (N * 4096)Bytes = 28 : 0, no data area is need.

If 0 < N <=4, the ratio will be sizeof(cmd entry) : sizeof(datas)
== 112Bytes : (N * 4096)Bytes = 28 : (N * 1024), so the max will
be 28 : 1024.

If N > 4, the sizeof(cmd entry) will be [(N - 4) *16 + 112] bytes,
and its corresponding data size will be [N * 4096], so the ratio
of sizeof(cmd entry) : sizeof(datas) == [(N - 4) * 16 + 112)Bytes
: (N * 4096)Bytes == 4/1024 - 12/(N * 1024), so the max is about
4 : 1024.

When N is bigger, the ratio will be smaller.

As the initial patch, we will set the cmd area size to 2M, and
the cmd area size to 32M. The TCMU will dynamically grows the data
area from 0 to max 32M size as needed.

The cmd area memory will be allocated through vmalloc(), and the
data area's blocks will be allocated individually later when needed.

The allocated data area block memory will be managed via radix tree.
For now the bitmap still be the most efficient way to search and
manage the block index, this could be update later.

Signed-off-by: Xiubo Li 
Signed-off-by: Jianfei Hu 
---
 drivers/target/target_core_user.c | 286 --
 1 file changed, 215 insertions(+), 71 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index c6874c3..fc9f237 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -2,6 +2,7 @@
  * Copyright (C) 2013 Shaohua Li 
  * Copyright (C) 2014 Red Hat, Inc.
  * Copyright (C) 2015 Arrikto, Inc.
+ * Copyright (C) 2017 Chinamobile, Inc.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -25,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -63,15 +65,17 @@
  * this may have a 'UAM' comment.
  */
 
-
 #define TCMU_TIME_OUT (30 * MSEC_PER_SEC)
 
-#define DATA_BLOCK_BITS 256
-#define DATA_BLOCK_SIZE 4096
+/* For cmd area, the size is fixed 2M */
+#define CMDR_SIZE (2 * 1024 * 1024)
 
-#define CMDR_SIZE (16 * 4096)
+/* For data area, the size is fixed 32M */
+#define DATA_BLOCK_BITS (8 * 1024)
+#define DATA_BLOCK_SIZE 4096
 #define DATA_SIZE (DATA_BLOCK_BITS * DATA_BLOCK_SIZE)
 
+/* The ring buffer size is 34M */
 #define TCMU_RING_SIZE (CMDR_SIZE + DATA_SIZE)
 
 static struct device *tcmu_root_device;
@@ -103,12 +107,14 @@ struct tcmu_dev {
size_t data_off;
size_t data_size;
 
-   DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS);
-
wait_queue_head_t wait_cmdr;
/* TODO should this be a mutex? */
spinlock_t cmdr_lock;
 
+   uint32_t dbi_cur;
+   DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS);
+   struct radix_tree_root data_blocks;
+
struct idr commands;
spinlock_t commands_lock;
 
@@ -130,7 +136,9 @@ struct tcmu_cmd {
 
/* Can't use se_cmd when cleaning up expired cmds, because if
   cmd has been completed then accessing se_cmd is off limits */
-   DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS);
+   uint32_t dbi_len;
+   uint32_t dbi_cur;
+   uint32_t *dbi;
 
unsigned long deadline;
 
@@ -161,10 +169,80 @@ enum tcmu_multicast_groups {
.netnsok = true,
 };
 
+static int tcmu_get_empty_block(struct tcmu_dev *udev, void **addr)
+{
+   void *p;
+   uint32_t dbi;
+   int ret;
+
+   dbi = find_first_zero_bit(udev->data_bitmap, DATA_BLOCK_BITS);
+   if (dbi > udev->dbi_cur)
+   udev->dbi_cur = dbi;
+
+   set_bit(dbi, udev->data_bitmap);
+
+   p = radix_tree_lookup(>data_blocks, dbi);
+   if (!p) {
+   p = kzalloc(DATA_BLOCK_SIZE, GFP_ATOMIC);
+   if (!p) {
+   clear_bit(dbi, udev->data_bitmap);
+   return -ENOMEM;
+   }
+
+   ret = radix_tree_insert(>data_blocks, dbi, p);
+   if (ret) {
+   kfree(p);
+   clear_bit(dbi, udev->data_bitmap);
+   return ret;
+   }
+   }
+
+   *addr = p;
+   return dbi;
+}
+
+static void *tcmu_get_block_addr(struct tcmu_dev *udev, uint32_t dbi)
+{
+   return radix_tree_lookup(>data_blocks, dbi);
+}
+
+#define tcmu_cmd_reset_dbi_cur(cmd) ((cmd)->dbi_cur = 0)
+#define tcmu_cmd_set_dbi(cmd, index) ((cmd)->dbi[(cmd)->dbi_cur++] = (index))
+#define tcmu_cmd_get_dbi(cmd) ((cmd)->dbi[(cmd)->dbi_cur++])
+
+static void tcmu_cmd_free_data(struct tcmu_cmd *tcmu_cmd)
+{
+ 

[PATCHv3 0/5] Dynamic growing data area support

2017-03-20 Thread lixiubo
From: Xiubo Li 

Changed for V3:
- [PATCHv2 2/5] fix double usage of blocks and possible page fault
  call trace.
- [PATCHv2 5/5] fix a mistake.

Changed for V2:
- [PATCHv2 1/5] just fixes some small spelling and other mistakes.
  And as the initial patch, here sets cmd area to 8M and data area to
  1G(1M fixed and 1023M growing)
- [PATCHv2 2/5] is a new one, adding global data block pool support.
  The max total size of the pool is 2G and all the targets will get
  growing blocks from here.
  Test this using multi-targets at the same time.
- [PATCHv2 3/5] changed nothing, respin it to avoid the conflict.
- [PATCHv2 4/5] and [PATCHv2 5/5] are new ones.


Xiubo Li (5):
  tcmu: Add dynamic growing data area feature support
  tcmu: Add global data block pool support
  target/user: Fix possible overwrite of t_data_sg's last iov[]
  target/user: Fix wrongly calculating of the base_command_size
  target/user: Clean up tcmu_queue_cmd_ring

 drivers/target/target_core_user.c | 621 +++---
 1 file changed, 514 insertions(+), 107 deletions(-)

-- 
1.8.3.1





[PATCHv3 4/5] target/user: Fix wrongly calculating of the base_command_size

2017-03-20 Thread lixiubo
From: Xiubo Li 

The t_data_nents and t_bidi_data_nents are all the numbers of the
segments, and we couldn't be sure the size of the data area block
will equal to size of the segment.

Use the actually block number needed intead of the sum of segments.

Signed-off-by: Xiubo Li 
---
 drivers/target/target_core_user.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index 780c30f..e3daf15 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -598,8 +598,7 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
struct tcmu_cmd *cmd,
 * expensive to tell how many regions are freed in the bitmap
*/
base_command_size = max(offsetof(struct tcmu_cmd_entry,
-   req.iov[se_cmd->t_bidi_data_nents +
-   se_cmd->t_data_nents]),
+   req.iov[tcmu_cmd->dbi_len]),
sizeof(struct tcmu_cmd_entry));
command_size = base_command_size
+ round_up(scsi_command_size(se_cmd->t_task_cdb), 
TCMU_OP_ALIGN_SIZE);
-- 
1.8.3.1





[PATCHv3 2/5] tcmu: Add global data block pool support

2017-03-20 Thread lixiubo
From: Xiubo Li 

For each target there will be one ring, when the target number
grows larger and larger, it could eventually runs out of the
system memories.

In this patch for each target ring, for the cmd area the size
will be limited to 8MB and for the data area the size will be
limited to 256K * PAGE_SIZE.

For all the targets' data areas, they will get empty blocks
from the "global data block pool", which has limited to 512K *
PAGE_SIZE for now.

When the "global data block pool" has been used up, then any
target could trigger the unmapping thread routine to shrink the
targets' rings. And for the idle targets the unmapping routine
will reserve 256 blocks at least.

When user space has touched the data blocks out of the iov[N],
the tcmu_page_fault() will return one zeroed blocks.

Signed-off-by: Xiubo Li 
Signed-off-by: Jianfei Hu 
---
 drivers/target/target_core_user.c | 426 ++
 1 file changed, 339 insertions(+), 87 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index fc9f237..8cbe196 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -31,6 +31,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -67,17 +69,24 @@
 
 #define TCMU_TIME_OUT (30 * MSEC_PER_SEC)
 
-/* For cmd area, the size is fixed 2M */
-#define CMDR_SIZE (2 * 1024 * 1024)
+/* For cmd area, the size is fixed 8MB */
+#define CMDR_SIZE (8 * 1024 * 1024)
 
-/* For data area, the size is fixed 32M */
-#define DATA_BLOCK_BITS (8 * 1024)
-#define DATA_BLOCK_SIZE 4096
+/*
+ * For data area, the block size is PAGE_SIZE and
+ * the total size is 256K * PAGE_SIZE.
+ */
+#define DATA_BLOCK_SIZE PAGE_SIZE
+#define DATA_BLOCK_BITS (256 * 1024)
 #define DATA_SIZE (DATA_BLOCK_BITS * DATA_BLOCK_SIZE)
+#define DATA_BLOCK_RES_BITS 256
 
-/* The ring buffer size is 34M */
+/* The total size of the ring is 8M + 256K * PAGE_SIZE */
 #define TCMU_RING_SIZE (CMDR_SIZE + DATA_SIZE)
 
+/* Default maximum of the global data blocks(512K * PAGE_SIZE) */
+#define TCMU_GLOBAL_MAX_BLOCKS (512 * 1024)
+
 static struct device *tcmu_root_device;
 
 struct tcmu_hba {
@@ -87,6 +96,8 @@ struct tcmu_hba {
 #define TCMU_CONFIG_LEN 256
 
 struct tcmu_dev {
+   struct list_head node;
+
struct se_device se_dev;
 
char *name;
@@ -98,6 +109,16 @@ struct tcmu_dev {
 
struct uio_info uio_info;
 
+   struct inode *inode;
+
+   struct mutex unmap_mutex;
+   bool unmapping;
+   bool waiting_global;
+   uint32_t dbi_cur;
+   uint32_t dbi_thresh;
+   DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS);
+   struct radix_tree_root data_blocks;
+
struct tcmu_mailbox *mb_addr;
size_t dev_size;
u32 cmdr_size;
@@ -111,10 +132,6 @@ struct tcmu_dev {
/* TODO should this be a mutex? */
spinlock_t cmdr_lock;
 
-   uint32_t dbi_cur;
-   DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS);
-   struct radix_tree_root data_blocks;
-
struct idr commands;
spinlock_t commands_lock;
 
@@ -146,6 +163,14 @@ struct tcmu_cmd {
unsigned long flags;
 };
 
+static struct task_struct *unmap_thread;
+static wait_queue_head_t unmap_wait;
+static DEFINE_MUTEX(udev_mutex);
+static LIST_HEAD(root_udev);
+
+static spinlock_t db_count_lock;
+static unsigned long global_db_count;
+
 static struct kmem_cache *tcmu_cmd_cache;
 
 /* multicast group */
@@ -169,54 +194,90 @@ enum tcmu_multicast_groups {
.netnsok = true,
 };
 
-static int tcmu_get_empty_block(struct tcmu_dev *udev, void **addr)
+#define tcmu_cmd_reset_dbi_cur(cmd) ((cmd)->dbi_cur = 0)
+#define tcmu_cmd_set_dbi(cmd, index) ((cmd)->dbi[(cmd)->dbi_cur++] = (index))
+#define tcmu_cmd_get_dbi(cmd) ((cmd)->dbi[(cmd)->dbi_cur++])
+
+static inline void tcmu_cmd_free_data(struct tcmu_cmd *tcmu_cmd, uint32_t len)
 {
-   void *p;
-   uint32_t dbi;
-   int ret;
+   struct tcmu_dev *udev = tcmu_cmd->tcmu_dev;
+   uint32_t i;
 
-   dbi = find_first_zero_bit(udev->data_bitmap, DATA_BLOCK_BITS);
-   if (dbi > udev->dbi_cur)
-   udev->dbi_cur = dbi;
+   for (i = 0; i < len; i++)
+   clear_bit(tcmu_cmd->dbi[i], udev->data_bitmap);
+}
 
-   set_bit(dbi, udev->data_bitmap);
+static inline bool tcmu_get_empty_block(struct tcmu_dev *udev,
+   struct tcmu_cmd *tcmu_cmd)
+{
+   struct page *page;
+   int ret, dbi;
 
-   p = radix_tree_lookup(>data_blocks, dbi);
-   if (!p) {
-   p = kzalloc(DATA_BLOCK_SIZE, GFP_ATOMIC);
-   if (!p) {
-   clear_bit(dbi, udev->data_bitmap);
-   return -ENOMEM;
+   dbi = find_first_zero_bit(udev->data_bitmap, udev->dbi_thresh);
+   if (dbi == udev->dbi_thresh)
+   return 

Re: [PATCH] scsi: libsas: fix ata xfer length

2017-03-20 Thread John Garry

On 19/03/2017 17:21, Martin K. Petersen wrote:

John Garry  writes:

John,


The total ata xfer length may not be calculated properly,
in that we do not use the proper method to get an sg element
dma length.

According to the code comment, sg_dma_len() should be used
after dma_map_sg() is called.

This issue was found by turning on the SMMUv3 in front of
the hisi_sas controller in hip07. Multiple sg elements
were being combined into a single element, but the original
first element length was being use as the total xfer length.




I should have added this originally to the changelog:
Fixes: ff2aeb1eb64c8a4770a6 ("libata: convert to chained sg")

BTW, I am surprised this issue has not been seen in almost 10 years, but 
we cannot attach a SATA disk when SMMU enabled without it.


Cheers,
John


Applied to 4.11/scsi-fixes.






[PATCH] qedf: fix wrong le16 conversion

2017-03-20 Thread Arnd Bergmann
gcc points out that we are converting a 16-bit integer into a 32-bit
little-endian type and assigning that to 16-bit little-endian
will end up with a zero:

drivers/scsi/qedf/drv_fcoe_fw_funcs.c: In function 
'init_initiator_rw_fcoe_task':
include/uapi/linux/byteorder/big_endian.h:32:26: error: large integer 
implicitly truncated to unsigned type [-Werror=overflow]
  t_st_ctx->read_write.rx_id = cpu_to_le32(FCOE_RX_ID);

The correct solution appears to be to just use a 16-bit byte swap instead.

Fixes: be086e7c53f1 ("qed*: Utilize Firmware 8.15.3.0")
Signed-off-by: Arnd Bergmann 
---
 drivers/scsi/qedf/drv_fcoe_fw_funcs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/qedf/drv_fcoe_fw_funcs.c 
b/drivers/scsi/qedf/drv_fcoe_fw_funcs.c
index bb812db48da6..8c65e3b034dc 100644
--- a/drivers/scsi/qedf/drv_fcoe_fw_funcs.c
+++ b/drivers/scsi/qedf/drv_fcoe_fw_funcs.c
@@ -8,7 +8,7 @@
 #include "drv_fcoe_fw_funcs.h"
 #include "drv_scsi_fw_funcs.h"
 
-#define FCOE_RX_ID ((u32)0x)
+#define FCOE_RX_ID (0xu)
 
 static inline void init_common_sqe(struct fcoe_task_params *task_params,
   enum fcoe_sqe_request_type request_type)
@@ -59,7 +59,7 @@ int init_initiator_rw_fcoe_task(struct fcoe_task_params 
*task_params,
t_st_ctx->read_only.task_type = task_params->task_type;
SET_FIELD(t_st_ctx->read_write.flags,
  FCOE_TSTORM_FCOE_TASK_ST_CTX_READ_WRITE_EXP_FIRST_FRAME, 1);
-   t_st_ctx->read_write.rx_id = cpu_to_le32(FCOE_RX_ID);
+   t_st_ctx->read_write.rx_id = cpu_to_le16(FCOE_RX_ID);
 
/* Ustorm ctx */
u_ag_ctx = >ustorm_ag_context;
@@ -151,7 +151,7 @@ int init_initiator_midpath_unsolicited_fcoe_task(
t_st_ctx->read_only.task_type = task_params->task_type;
SET_FIELD(t_st_ctx->read_write.flags,
  FCOE_TSTORM_FCOE_TASK_ST_CTX_READ_WRITE_EXP_FIRST_FRAME, 1);
-   t_st_ctx->read_write.rx_id = cpu_to_le32(FCOE_RX_ID);
+   t_st_ctx->read_write.rx_id = cpu_to_le16(FCOE_RX_ID);
 
/* Init Ustorm */
u_ag_ctx = >ustorm_ag_context;
-- 
2.9.0



Re: [PATCH v6 05/15] mlx4: Replace PCI pool old API

2017-03-20 Thread Leon Romanovsky
On Sun, Mar 19, 2017 at 06:03:54PM +0100, Romain Perier wrote:
> The PCI pool API is deprecated. This commit replaces the PCI pool old
> API by the appropriate function with the DMA pool API.
>
> Signed-off-by: Romain Perier 
> Acked-by: Peter Senna Tschudin 
> Tested-by: Peter Senna Tschudin 
> ---
>  drivers/net/ethernet/mellanox/mlx4/cmd.c  | 10 +-
>  drivers/net/ethernet/mellanox/mlx4/mlx4.h |  2 +-
>  2 files changed, 6 insertions(+), 6 deletions(-)
>

Thanks,
Reviewed-by: Leon Romanovsky 


signature.asc
Description: PGP signature


Re: [PATCH v6 06/15] mlx5: Replace PCI pool old API

2017-03-20 Thread Leon Romanovsky
On Sun, Mar 19, 2017 at 06:03:55PM +0100, Romain Perier wrote:
> The PCI pool API is deprecated. This commit replaces the PCI pool old
> API by the appropriate function with the DMA pool API.
>
> Signed-off-by: Romain Perier 
> Reviewed-by: Peter Senna Tschudin 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 11 ++-
>  include/linux/mlx5/driver.h   |  2 +-
>  2 files changed, 7 insertions(+), 6 deletions(-)
>

Thanks,
Acked-by: Leon Romanovsky 


signature.asc
Description: PGP signature


Re: [PATCH] scsi: sr: fix oob access in get_capabilities

2017-03-20 Thread Kefeng Wang


On 2017/3/18 7:29, Martin K. Petersen wrote:
> Kefeng Wang  writes:
> 
> Kefeng,
> 
>> root@localhost ~]# sg_modes -p 0x2a /dev/sr0
>> QEMU  QEMU DVD-ROM  0.15   peripheral_type: cd/dvd [0x5]
>> Mode parameter header from MODE SENSE(10):
>> Invalid block descriptor length=512, ignore
>>   Mode data length=36, medium type=0x70, specific param=0x00, longlba=0
>>   Block descriptor length=0
 MM capabilities and mechanical status (obsolete), page_control: current
>>  00 2a 12 00 00 71 60 29 00  02 c2 00 02 02 00 02 c2
>>  10 00 00 00 00
>> Unexpectedly received extra mode page responses, ignore
> 
> That looks pretty broken.
> 
> Could you try the following patch?

The issue still exists, the patch return zero in scsi_mode_sense(), but zero 
means
SAM_STAT_GOOD in scsi_status_is_good(), so n will be still bigger than 512;

Thanks,