[PATCH] scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND
When sg_scsi_ioctl() fails to prepare request to submit in blk_rq_map_kern() we jump to a label where we just end up copying (luckily zeroed-out) kernel buffer to userspace instead of reporting error. Fix the problem by jumping to the right label. CC: Jens Axboe ax...@kernel.dk CC: linux-scsi@vger.kernel.org CC: sta...@vger.kernel.org Coverity-id: 1226871 Signed-off-by: Jan Kara j...@suse.cz --- block/scsi_ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index abb2e65b24cc..cc7927a1a62e 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -508,7 +508,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode, if (bytes blk_rq_map_kern(q, rq, buffer, bytes, __GFP_WAIT)) { err = DRIVER_ERROR 24; - goto out; + goto error; } memset(sense, 0, sizeof(sense)); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] osd: Boaz Harrosh - change of email
On 10/22/2014 12:04 AM, James Bottomley wrote: On Tue, 2014-10-21 at 17:05 +0300, Boaz Harrosh wrote: Hi Sir Linus A small administrative stuff, was on vacation and the old email bounced on me. I was hoping to still make the 3 weeks merge window, but it was 2 weeks at the end. Your call if to make this wait for next window. Needless to say that it is ZERO risk, just change of email. Based on commit [bfe01a5b] Linux 3.17 3 patches available in the git repository at: git://git.open-osd.org/linux-open-osd.git for-linus for you to fetch changes up to 1fa3a002b2546c42c343c77c144871285896ced5: Boaz Harrosh - fix email in Documentation (2014-10-19 20:36:36 +0300) Boaz Harrosh (3): MAINTAINERS: Change Boaz Harrosh's email Boaz Harrosh - Fix broken email address Boaz Harrosh - fix email in Documentation Documentation/scsi/osd.txt | 3 +-- MAINTAINERS | 2 +- drivers/scsi/osd/Kbuild | 2 +- drivers/scsi/osd/Kconfig| 2 +- drivers/scsi/osd/osd_debug.h| 2 +- drivers/scsi/osd/osd_initiator.c| 4 ++-- drivers/scsi/osd/osd_uld.c | 4 ++-- fs/exofs/Kbuild | 2 +- fs/exofs/common.h | 2 +- fs/exofs/dir.c | 2 +- fs/exofs/exofs.h| 2 +- fs/exofs/file.c | 2 +- fs/exofs/inode.c| 2 +- fs/exofs/namei.c| 2 +- fs/exofs/ore.c | 4 ++-- fs/exofs/ore_raid.c | 2 +- fs/exofs/ore_raid.h | 2 +- fs/exofs/super.c| 2 +- fs/exofs/symlink.c | 2 +- fs/exofs/sys.c | 2 +- fs/nfs/objlayout/objio_osd.c| 2 +- fs/nfs/objlayout/objlayout.c| 2 +- fs/nfs/objlayout/objlayout.h| 2 +- fs/nfs/objlayout/pnfs_osd_xdr_cli.c | 2 +- include/linux/pnfs_osd_xdr.h| 2 +- include/scsi/osd_initiator.h| 2 +- include/scsi/osd_ore.h | 2 +- include/scsi/osd_protocol.h | 4 ++-- include/scsi/osd_sec.h | 2 +- include/scsi/osd_sense.h| 2 +- include/scsi/osd_types.h| 2 +- 31 files changed, 35 insertions(+), 36 deletions(-) This is a bit of an unnecessary massive churn. No one expects the author named in the file to stay up to date, especially because the @domain.com usually credits the company who paid for the code, so it's left in as a kind of mark of respect for them. I'm not saying it applies in your case, just that it creates the common expectation of in-file authors needing to be traced through the MAINTAINERS file. Could you not just update the MAINTAINERS file only, like everyone else? OK, I did not know this. The Panasas credit is still there so no harm done. Anyway Linus pulled it, so lets leave it at that for now. Next time I'll know. James Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/18] SCSI XCOPY support for the kernel and device mapper
Hi I uploaded the new version of SCSI/device-mapper XCOPY patches here: http://people.redhat.com/~mpatocka/patches/kernel/xcopy/series.html I am also sending the patches in the following emails. Mikulas -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/18] dm: remove num_write_bios
The target can set the function num_write_bios - dm will issue this callback to ask the target how many bios does it want to receive. This was intended for the dm-cache target, but it is not useable due to a race condition (see the description of e2e74d617eadc15f601983270c4f4a6935c5a943). num_write_bios is unused, so we remove it. Note that we deliberately leave the for loop in __clone_and_map_data_bio - it will be used in the next patch. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- drivers/md/dm.c |6 -- include/linux/device-mapper.h | 15 --- 2 files changed, 21 deletions(-) Index: linux-3.18-rc1/drivers/md/dm.c === --- linux-3.18-rc1.orig/drivers/md/dm.c 2014-10-21 00:53:43.0 +0200 +++ linux-3.18-rc1/drivers/md/dm.c 2014-10-21 00:53:47.0 +0200 @@ -1433,12 +1433,6 @@ static void __clone_and_map_data_bio(str unsigned target_bio_nr; unsigned num_target_bios = 1; - /* -* Does the target want to receive duplicate copies of the bio? -*/ - if (bio_data_dir(bio) == WRITE ti-num_write_bios) - num_target_bios = ti-num_write_bios(ti, bio); - for (target_bio_nr = 0; target_bio_nr num_target_bios; target_bio_nr++) { tio = alloc_tio(ci, ti, target_bio_nr); tio-len_ptr = len; Index: linux-3.18-rc1/include/linux/device-mapper.h === --- linux-3.18-rc1.orig/include/linux/device-mapper.h 2014-10-21 00:53:43.0 +0200 +++ linux-3.18-rc1/include/linux/device-mapper.h2014-10-21 00:53:47.0 +0200 @@ -184,14 +184,6 @@ struct target_type { #define DM_TARGET_IMMUTABLE0x0004 #define dm_target_is_immutable(type) ((type)-features DM_TARGET_IMMUTABLE) -/* - * Some targets need to be sent the same WRITE bio severals times so - * that they can send copies of it to different devices. This function - * examines any supplied bio and returns the number of copies of it the - * target requires. - */ -typedef unsigned (*dm_num_write_bios_fn) (struct dm_target *ti, struct bio *bio); - struct dm_target { struct dm_table *table; struct target_type *type; @@ -231,13 +223,6 @@ struct dm_target { */ unsigned per_bio_data_size; - /* -* If defined, this function is called to find out how many -* duplicate bios should be sent to the target when writing -* data. -*/ - dm_num_write_bios_fn num_write_bios; - /* target specific data */ void *private; -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/18] blk-lib: fix error reporting
The function bio_batch_end_io ignores -EOPNOTSUPP. It doesn't matter for discard (the device isn't required to discard anything, so missing the error code and reporting success shouldn't cause any trouble). However, for WRITE SAME command, missing the error code is obviously wrong. It may fool the user into thinking that the data were written while in fact they weren't. Note that in device mapper, devices may be dynamically reconfigured, so a device that supports WRITE SAME may stop supporting it at any time and return -EOPNOTSUPP. Ignoring -EOPNOTSUPP is wrong. This patch changes bio_batch-flags to an error field and stores the last error there - so that the error is reported accurately and it isn't ignored. Signed-off-by: Mikulas Patocka mpato...@redhat.com Cc: sta...@vger.kernel.org --- block/blk-lib.c | 25 - 1 file changed, 12 insertions(+), 13 deletions(-) Index: linux-3.16-rc3/block/blk-lib.c === --- linux-3.16-rc3.orig/block/blk-lib.c 2014-07-03 18:17:21.0 +0200 +++ linux-3.16-rc3/block/blk-lib.c 2014-07-03 18:36:52.0 +0200 @@ -11,7 +11,7 @@ struct bio_batch { atomic_tdone; - unsigned long flags; + int error; struct completion *wait; }; @@ -19,8 +19,8 @@ static void bio_batch_end_io(struct bio { struct bio_batch *bb = bio-bi_private; - if (err (err != -EOPNOTSUPP)) - clear_bit(BIO_UPTODATE, bb-flags); + if (unlikely(err)) + ACCESS_ONCE(bb-error) = err; if (atomic_dec_and_test(bb-done)) complete(bb-wait); bio_put(bio); @@ -78,7 +78,7 @@ int blkdev_issue_discard(struct block_de } atomic_set(bb.done, 1); - bb.flags = 1 BIO_UPTODATE; + bb.error = 0; bb.wait = wait; blk_start_plug(plug); @@ -134,8 +134,8 @@ int blkdev_issue_discard(struct block_de if (!atomic_dec_and_test(bb.done)) wait_for_completion_io(wait); - if (!test_bit(BIO_UPTODATE, bb.flags)) - ret = -EIO; + if (likely(!ret)) + ret = bb.error; return ret; } @@ -172,7 +172,7 @@ int blkdev_issue_write_same(struct block return -EOPNOTSUPP; atomic_set(bb.done, 1); - bb.flags = 1 BIO_UPTODATE; + bb.error = 0; bb.wait = wait; while (nr_sects) { @@ -208,8 +208,8 @@ int blkdev_issue_write_same(struct block if (!atomic_dec_and_test(bb.done)) wait_for_completion_io(wait); - if (!test_bit(BIO_UPTODATE, bb.flags)) - ret = -ENOTSUPP; + if (likely(!ret)) + ret = bb.error; return ret; } @@ -236,7 +236,7 @@ static int __blkdev_issue_zeroout(struct DECLARE_COMPLETION_ONSTACK(wait); atomic_set(bb.done, 1); - bb.flags = 1 BIO_UPTODATE; + bb.error = 0; bb.wait = wait; ret = 0; @@ -270,9 +270,8 @@ static int __blkdev_issue_zeroout(struct if (!atomic_dec_and_test(bb.done)) wait_for_completion_io(wait); - if (!test_bit(BIO_UPTODATE, bb.flags)) - /* One of bios in the batch was completed with error.*/ - ret = -EIO; + if (likely(!ret)) + ret = bb.error; return ret; } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/18] dm: introduce dm_ask_for_duplicate_bios
This function can be used if the target needs to receive another duplicate of the current bio. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- drivers/md/dm.c | 24 +++- include/linux/device-mapper.h |2 ++ 2 files changed, 21 insertions(+), 5 deletions(-) Index: linux-3.18-rc1/drivers/md/dm.c === --- linux-3.18-rc1.orig/drivers/md/dm.c 2014-10-21 00:42:31.0 +0200 +++ linux-3.18-rc1/drivers/md/dm.c 2014-10-21 00:44:13.0 +0200 @@ -1284,9 +1284,9 @@ EXPORT_SYMBOL_GPL(dm_set_target_max_io_l * to make it empty) * The target requires that region 3 is to be sent in the next bio. * - * If the target wants to receive multiple copies of the bio (via num_*bios, etc), - * the partially processed part (the sum of regions 1+2) must be the same for all - * copies of the bio. + * If the target wants to receive multiple copies of the bio with num_*_bios or + * dm_ask_for_duplicate_bio, the partially processed part (the sum of regions + * 1+2) must be the same for all copies of the bio. */ void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors) { @@ -1300,6 +1300,17 @@ void dm_accept_partial_bio(struct bio *b } EXPORT_SYMBOL_GPL(dm_accept_partial_bio); +/* + * The target driver can call this function only from the map routine. The + * target driver requests that the dm sends more duplicates of the current bio. + */ +void dm_ask_for_duplicate_bios(struct bio *bio, unsigned n_duplicates) +{ + struct dm_target_io *tio = container_of(bio, struct dm_target_io, clone); + (*tio-num_bios) += n_duplicates; +} +EXPORT_SYMBOL_GPL(dm_ask_for_duplicate_bios); + static void __map_bio(struct dm_target_io *tio) { int r; @@ -1390,12 +1401,14 @@ static struct dm_target_io *alloc_tio(st static void __clone_and_map_simple_bio(struct clone_info *ci, struct dm_target *ti, - unsigned target_bio_nr, unsigned *len) + unsigned target_bio_nr, unsigned *len, + unsigned *num_bios) { struct dm_target_io *tio = alloc_tio(ci, ti, target_bio_nr); struct bio *clone = tio-clone; tio-len_ptr = len; + tio-num_bios = num_bios; __bio_clone_fast(clone, ci-bio); if (len) @@ -1410,7 +1423,7 @@ static void __send_duplicate_bios(struct unsigned target_bio_nr; for (target_bio_nr = 0; target_bio_nr num_bios; target_bio_nr++) - __clone_and_map_simple_bio(ci, ti, target_bio_nr, len); + __clone_and_map_simple_bio(ci, ti, target_bio_nr, len, num_bios); } static int __send_empty_flush(struct clone_info *ci) @@ -1436,6 +1449,7 @@ static void __clone_and_map_data_bio(str for (target_bio_nr = 0; target_bio_nr num_target_bios; target_bio_nr++) { tio = alloc_tio(ci, ti, target_bio_nr); tio-len_ptr = len; + tio-num_bios = num_target_bios; clone_bio(tio, bio, sector, *len); __map_bio(tio); } Index: linux-3.18-rc1/include/linux/device-mapper.h === --- linux-3.18-rc1.orig/include/linux/device-mapper.h 2014-10-21 00:42:31.0 +0200 +++ linux-3.18-rc1/include/linux/device-mapper.h2014-10-21 00:42:40.0 +0200 @@ -271,6 +271,7 @@ struct dm_target_io { struct dm_target *ti; unsigned target_bio_nr; unsigned *len_ptr; + unsigned *num_bios; struct bio clone; }; @@ -382,6 +383,7 @@ struct gendisk *dm_disk(struct mapped_de int dm_suspended(struct dm_target *ti); int dm_noflush_suspending(struct dm_target *ti); void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors); +void dm_ask_for_duplicate_bios(struct bio *bio, unsigned n_duplicates); union map_info *dm_get_rq_mapinfo(struct request *rq); struct queue_limits *dm_get_queue_limits(struct mapped_device *md); -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/18] block copy: use two bios
This patch changes the architecture of xcopy so that two bios are used. There used to be just one bio that held pointers to both source and destination block device. However a bio with two block devices cannot really be passed though block midlayer drivers (dm and md). When we need to send the XCOPY command, we call the function blkdev_issue_copy. This function creates two bios, the first with bi_rw READ | REQ_COPY and the second WRITE | REQ_COPY. The bios have a pointer to a common bi_copy structure. These bios travel independently through the block device stack. When both the bios reach the physical disk driver (the function blk_queue_bio), they are paired, a request is made and the request is sent to the SCSI disk driver. It is possible that one of the bios reaches a device that doesn't support XCOPY, in that case both bios are aborted with an error. Note that there is no guarantee that the XCOPY command will succeed. If it doesn't succeed, the caller is supposed to perform the copy manually. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- block/bio.c | 26 ++- block/blk-core.c | 34 block/blk-lib.c | 76 -- drivers/scsi/sd.c |7 +--- include/linux/blk_types.h | 12 ++- 5 files changed, 131 insertions(+), 24 deletions(-) Index: linux-3.18-rc1/block/blk-lib.c === --- linux-3.18-rc1.orig/block/blk-lib.c 2014-10-21 00:47:02.0 +0200 +++ linux-3.18-rc1/block/blk-lib.c 2014-10-21 00:48:51.0 +0200 @@ -305,6 +305,36 @@ int blkdev_issue_zeroout(struct block_de } EXPORT_SYMBOL(blkdev_issue_zeroout); +static void bio_copy_end_io(struct bio *bio, int error) +{ + struct bio_copy *bc = bio-bi_copy; + if (unlikely(error)) { + unsigned long flags; + int dir; + struct bio *other; + + /* if the other bio is waiting for the pair, release it */ + spin_lock_irqsave(bc-spinlock, flags); + if (bc-error = 0) + bc-error = error; + dir = bio_data_dir(bio); + other = bc-pair[dir ^ 1]; + bc-pair[dir ^ 1] = NULL; + spin_unlock_irqrestore(bc-spinlock, flags); + if (other) + bio_endio(other, error); + } + bio_put(bio); + if (atomic_dec_and_test(bc-in_flight)) { + struct bio_batch *bb = bc-private; + if (unlikely(bc-error 0) !ACCESS_ONCE(bb-error)) + ACCESS_ONCE(bb-error) = bc-error; + kfree(bc); + if (atomic_dec_and_test(bb-done)) + complete(bb-wait); + } +} + /** * blkdev_issue_copy - queue a copy same operation * @src_bdev: source blockdev @@ -351,9 +381,9 @@ int blkdev_issue_copy(struct block_devic bb.wait = wait; while (nr_sects) { - struct bio *bio; + struct bio *read_bio, *write_bio; struct bio_copy *bc; - unsigned int chunk; + unsigned int chunk = min(nr_sects, max_copy_sectors); bc = kmalloc(sizeof(struct bio_copy), gfp_mask); if (!bc) { @@ -361,27 +391,43 @@ int blkdev_issue_copy(struct block_devic break; } - bio = bio_alloc(gfp_mask, 1); - if (!bio) { + read_bio = bio_alloc(gfp_mask, 1); + if (!read_bio) { kfree(bc); ret = -ENOMEM; break; } - chunk = min(nr_sects, max_copy_sectors); - - bio-bi_iter.bi_sector = dst_sector; - bio-bi_iter.bi_size = chunk 9; - bio-bi_end_io = bio_batch_end_io; - bio-bi_bdev = dst_bdev; - bio-bi_private = bb; - bio-bi_copy = bc; + write_bio = bio_alloc(gfp_mask, 1); + if (!write_bio) { + bio_put(read_bio); + kfree(bc); + ret = -ENOMEM; + break; + } - bc-bic_bdev = src_bdev; - bc-bic_sector = src_sector; + atomic_set(bc-in_flight, 2); + bc-error = 1; + bc-pair[0] = NULL; + bc-pair[1] = NULL; + bc-private = bb; + spin_lock_init(bc-spinlock); + + read_bio-bi_iter.bi_sector = src_sector; + read_bio-bi_iter.bi_size = chunk 9; + read_bio-bi_end_io = bio_copy_end_io; + read_bio-bi_bdev = src_bdev; + read_bio-bi_copy = bc; + + write_bio-bi_iter.bi_sector = dst_sector; +
[PATCH 4/18] block copy: initial XCOPY offload support
This is Martin Petersen's xcopy patch (https://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/commit/?h=xcopyid=0bdeed274e16b3038a851552188512071974eea8) with some bug fixes, ported to the current kernel. This patch makes it possible to use the SCSI XCOPY command. We create a bio that has REQ_COPY flag in bi_rw and a bi_copy structure that defines the source device. The target device is defined in the bi_bdev and bi_iter.bi_sector. There is a new BLKCOPY ioctl that makes it possible to use XCOPY from userspace. The ioctl argument is a pointer to an array of four uint64_t values. The first value is a source byte offset, the second value is a destination byte offset, the third value is byte length. The forth value is written by the kernel and it represents the number of bytes that the kernel actually copied. Signed-off-by: Martin K. Petersen martin.peter...@oracle.com Signed-off-by: Mikulas Patocka mpato...@redhat.com --- Documentation/ABI/testing/sysfs-block |9 + block/bio.c |2 block/blk-core.c |5 block/blk-lib.c | 95 block/blk-merge.c |7 block/blk-settings.c | 13 + block/blk-sysfs.c | 10 + block/compat_ioctl.c |1 block/ioctl.c | 49 ++ drivers/scsi/scsi.c | 57 +++ drivers/scsi/sd.c | 269 +- drivers/scsi/sd.h |4 include/linux/bio.h |9 - include/linux/blk_types.h | 15 + include/linux/blkdev.h| 15 + include/scsi/scsi_device.h|3 include/uapi/linux/fs.h |1 17 files changed, 551 insertions(+), 13 deletions(-) Index: linux-3.18-rc1/Documentation/ABI/testing/sysfs-block === --- linux-3.18-rc1.orig/Documentation/ABI/testing/sysfs-block 2014-10-21 01:04:13.0 +0200 +++ linux-3.18-rc1/Documentation/ABI/testing/sysfs-block2014-10-21 01:04:22.0 +0200 @@ -228,3 +228,12 @@ Description: write_same_max_bytes is 0, write same is not supported by the device. + +What: /sys/block/disk/queue/copy_max_bytes +Date: January 2014 +Contact: Martin K. Petersen martin.peter...@oracle.com +Description: + Devices that support copy offloading will set this value + to indicate the maximum buffer size in bytes that can be + copied in one operation. If the copy_max_bytes is 0 the + device does not support copy offload. Index: linux-3.18-rc1/block/blk-core.c === --- linux-3.18-rc1.orig/block/blk-core.c2014-10-21 01:04:14.0 +0200 +++ linux-3.18-rc1/block/blk-core.c 2014-10-21 01:04:22.0 +0200 @@ -1828,6 +1828,11 @@ generic_make_request_checks(struct bio * goto end_io; } + if (bio-bi_rw REQ_COPY !bdev_copy_offload(bio-bi_bdev)) { + err = -EOPNOTSUPP; + goto end_io; + } + /* * Various block parts want %current-io_context and lazy ioc * allocation ends up trading a lot of pain for a small amount of Index: linux-3.18-rc1/block/blk-lib.c === --- linux-3.18-rc1.orig/block/blk-lib.c 2014-10-21 01:04:14.0 +0200 +++ linux-3.18-rc1/block/blk-lib.c 2014-10-21 01:04:22.0 +0200 @@ -304,3 +304,98 @@ int blkdev_issue_zeroout(struct block_de return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); } EXPORT_SYMBOL(blkdev_issue_zeroout); + +/** + * blkdev_issue_copy - queue a copy same operation + * @src_bdev: source blockdev + * @src_sector:source sector + * @dst_bdev: destination blockdev + * @dst_sector: destination sector + * @nr_sects: number of sectors to copy + * @gfp_mask: memory allocation flags (for bio_alloc) + * + * Description: + *Copy a block range from source device to target device. + */ +int blkdev_issue_copy(struct block_device *src_bdev, sector_t src_sector, + struct block_device *dst_bdev, sector_t dst_sector, + unsigned int nr_sects, gfp_t gfp_mask) +{ + DECLARE_COMPLETION_ONSTACK(wait); + struct request_queue *sq = bdev_get_queue(src_bdev); + struct request_queue *dq = bdev_get_queue(dst_bdev); + unsigned int max_copy_sectors; + struct bio_batch bb; + int ret = 0; + + if (!sq || !dq) + return -ENXIO; + + max_copy_sectors = min(sq-limits.max_copy_sectors, + dq-limits.max_copy_sectors); + + if (max_copy_sectors == 0) + return -EOPNOTSUPP; + + if (src_sector
[PATCH 7/18] block copy: use a timer to fix a theoretical deadlock
The block layer creates two bios for each copy operation. The bios travel independently through the storage stack and they are paired at the block device. There is a theoretical problem with this - the block device stack only guarantees forward progress for a single bio. When two bios are sent, it is possible (though very unlikely) that the first bio exhausts some mempool and the second bio waits until there is free space in the mempool (and thus it waits until the first bio finishes). To avoid this deadlock, we introduce a timer. If the two bios are not paired at the physical block device within 10 seconds, the copy operation is aborted and the bio that waits to be paired is released with an error. Note that there is no guarantee that any XCOPY operation succeed, so aborting an operation with an error shouldn't cause any problems - the caller is supposed to perform the copy manually if XCOPY fails. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- block/blk-lib.c | 27 +++ include/linux/blk_types.h |2 ++ 2 files changed, 29 insertions(+) Index: linux-3.16-rc5/block/blk-lib.c === --- linux-3.16-rc5.orig/block/blk-lib.c 2014-07-15 15:27:49.0 +0200 +++ linux-3.16-rc5/block/blk-lib.c 2014-07-15 15:27:51.0 +0200 @@ -305,6 +305,30 @@ int blkdev_issue_zeroout(struct block_de } EXPORT_SYMBOL(blkdev_issue_zeroout); +#define BLK_COPY_TIMEOUT (10 * HZ) + +static void blk_copy_timeout(unsigned long bc_) +{ + struct bio_copy *bc = (struct bio_copy *)bc_; + struct bio *bio0 = NULL, *bio1 = NULL; + + WARN_ON(!irqs_disabled()); + + spin_lock(bc-spinlock); /* the timer is IRQSAFE */ + if (bc-error == 1) { + bc-error = -ETIMEDOUT; + bio0 = bc-pair[0]; + bio1 = bc-pair[1]; + bc-pair[0] = bc-pair[1] = NULL; + } + spin_unlock(bc-spinlock); + + if (bio0) + bio_endio(bio0, -ETIMEDOUT); + if (bio1) + bio_endio(bio1, -ETIMEDOUT); +} + static void bio_copy_end_io(struct bio *bio, int error) { struct bio_copy *bc = bio-bi_copy; @@ -338,6 +362,7 @@ static void bio_copy_end_io(struct bio * } while (unlikely(atomic64_cmpxchg(bc-first_error, first_error, bc-offset) != first_error)); } + del_timer_sync(bc-timer); kfree(bc); if (atomic_dec_and_test(bb-done)) complete(bb-wait); @@ -428,6 +453,8 @@ int blkdev_issue_copy(struct block_devic bc-first_error = first_error; bc-offset = offset; spin_lock_init(bc-spinlock); + __setup_timer(bc-timer, blk_copy_timeout, (unsigned long)bc, TIMER_IRQSAFE); + mod_timer(bc-timer, jiffies + BLK_COPY_TIMEOUT); read_bio-bi_iter.bi_sector = src_sector; read_bio-bi_iter.bi_size = chunk 9; Index: linux-3.16-rc5/include/linux/blk_types.h === --- linux-3.16-rc5.orig/include/linux/blk_types.h 2014-07-15 15:27:49.0 +0200 +++ linux-3.16-rc5/include/linux/blk_types.h2014-07-15 15:27:51.0 +0200 @@ -6,6 +6,7 @@ #define __LINUX_BLK_TYPES_H #include linux/types.h +#include linux/timer.h struct bio_set; struct bio; @@ -52,6 +53,7 @@ struct bio_copy { atomic64_t *first_error; sector_t offset; spinlock_t spinlock; + struct timer_list timer; }; /* -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/18] block copy: use asynchronous notification
In dm-snapshot target there may be large number of copy requests in progress. If every pending copy request consumed a process context, it would put too much load on the system. To avoid this load, we need asynchronous notification when copy finishes - we can pass a callback to the function blkdev_issue_copy, if the callback is non-NULL, blkdev_issue_copy exits when it submits all the copy bios and the callback is called when the copy operation finishes. With the callback mechanism, there can be large number of in-progress copy requests and we do not need process context for each of them. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- block/blk-lib.c | 152 -- block/ioctl.c |2 include/linux/blk_types.h |5 - include/linux/blkdev.h|2 4 files changed, 114 insertions(+), 47 deletions(-) Index: linux-3.18-rc1/block/blk-lib.c === --- linux-3.18-rc1.orig/block/blk-lib.c 2014-10-21 00:49:04.0 +0200 +++ linux-3.18-rc1/block/blk-lib.c 2014-10-21 00:49:05.0 +0200 @@ -305,6 +305,17 @@ int blkdev_issue_zeroout(struct block_de } EXPORT_SYMBOL(blkdev_issue_zeroout); +struct bio_copy_batch { + atomic_long_t done; + int async_error; + int sync_error; + sector_t sync_copied; + atomic64_t first_error; + void (*callback)(void *data, int error); + void *data; + sector_t *copied; +}; + #define BLK_COPY_TIMEOUT (10 * HZ) static void blk_copy_timeout(unsigned long bc_) @@ -329,6 +340,18 @@ static void blk_copy_timeout(unsigned lo bio_endio(bio1, -ETIMEDOUT); } +static void blk_copy_batch_finish(struct bio_copy_batch *batch) +{ + void (*fn)(void *, int) = batch-callback; + void *data = batch-data; + int error = unlikely(batch-sync_error) ? batch-sync_error : batch-async_error; + if (batch-copied) + *batch-copied = min(batch-sync_copied, (sector_t)atomic64_read(batch-first_error)); + kfree(batch); + if (fn) + fn(data, error); +} + static void bio_copy_end_io(struct bio *bio, int error) { struct bio_copy *bc = bio-bi_copy; @@ -350,22 +373,22 @@ static void bio_copy_end_io(struct bio * } bio_put(bio); if (atomic_dec_and_test(bc-in_flight)) { - struct bio_batch *bb = bc-private; + struct bio_copy_batch *batch = bc-batch; if (unlikely(bc-error 0)) { u64 first_error; - if (!ACCESS_ONCE(bb-error)) - ACCESS_ONCE(bb-error) = bc-error; + if (!ACCESS_ONCE(batch-async_error)) + ACCESS_ONCE(batch-async_error) = bc-error; do { - first_error = atomic64_read(bc-first_error); + first_error = atomic64_read(batch-first_error); if (bc-offset = first_error) break; - } while (unlikely(atomic64_cmpxchg(bc-first_error, + } while (unlikely(atomic64_cmpxchg(batch-first_error, first_error, bc-offset) != first_error)); } del_timer_sync(bc-timer); kfree(bc); - if (atomic_dec_and_test(bb-done)) - complete(bb-wait); + if (atomic_long_dec_and_test(batch-done)) + blk_copy_batch_finish(batch); } } @@ -394,6 +417,18 @@ static unsigned blkdev_copy_merge(struct } } +struct bio_copy_completion { + struct completion wait; + int error; +}; + +static void bio_copy_sync_callback(void *ptr, int error) +{ + struct bio_copy_completion *comp = ptr; + comp-error = error; + complete(comp-wait); +} + /** * blkdev_issue_copy - queue a copy same operation * @src_bdev: source blockdev @@ -408,69 +443,95 @@ static unsigned blkdev_copy_merge(struct */ int blkdev_issue_copy(struct block_device *src_bdev, sector_t src_sector, struct block_device *dst_bdev, sector_t dst_sector, - sector_t nr_sects, gfp_t gfp_mask, sector_t *copied) + sector_t nr_sects, gfp_t gfp_mask, + void (*callback)(void *, int), void *data, + sector_t *copied) { DECLARE_COMPLETION_ONSTACK(wait); struct request_queue *sq = bdev_get_queue(src_bdev); struct request_queue *dq = bdev_get_queue(dst_bdev); unsigned int max_copy_sectors; - struct bio_batch bb; - int ret = 0; - atomic64_t first_error = ATOMIC64_INIT(nr_sects); - sector_t offset = 0; + int ret; + struct bio_copy_batch *batch; + struct
[PATCH 8/18] block copy: use merge_bvec_fn for copies
We use merge_bvec_fn to make sure that copies do not split internal boundaries of device mapper devices. There is no possibility to split a copy bio (splitting would complicate the design significantly), so we must use merge_bvec_fn to make sure that the bios have appropriate size for the device mapper stack. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- block/blk-lib.c | 37 + 1 file changed, 37 insertions(+) Index: linux-3.16-rc5/block/blk-lib.c === --- linux-3.16-rc5.orig/block/blk-lib.c 2014-07-15 15:27:51.0 +0200 +++ linux-3.16-rc5/block/blk-lib.c 2014-07-15 15:27:59.0 +0200 @@ -369,6 +369,31 @@ static void bio_copy_end_io(struct bio * } } +static unsigned blkdev_copy_merge(struct block_device *bdev, + struct request_queue *q, unsigned long bi_rw, + sector_t sector, unsigned n) +{ + if (!q-merge_bvec_fn) { + return n; + } else { + unsigned m; + struct bvec_merge_data bvm = { + .bi_bdev = bdev, + .bi_sector = sector, + .bi_size = 0, + .bi_rw = bi_rw, + }; + struct bio_vec vec = { + .bv_page = NULL, + .bv_len = likely(n = UINT_MAX 9) ? n 9 : UINT_MAX ~511U, + .bv_offset = 0, + }; + m = q-merge_bvec_fn(q, bvm, vec); + m = 9; + return min(m, n); + } +} + /** * blkdev_issue_copy - queue a copy same operation * @src_bdev: source blockdev @@ -424,6 +449,18 @@ int blkdev_issue_copy(struct block_devic struct bio_copy *bc; unsigned chunk = (unsigned)min(nr_sects, (sector_t)max_copy_sectors); + chunk = blkdev_copy_merge(src_bdev, sq, READ | REQ_COPY, src_sector, chunk); + if (!chunk) { + ret = -EOPNOTSUPP; + break; + } + + chunk = blkdev_copy_merge(dst_bdev, dq, WRITE | REQ_COPY, dst_sector, chunk); + if (!chunk) { + ret = -EOPNOTSUPP; + break; + } + bc = kmalloc(sizeof(struct bio_copy), gfp_mask); if (!bc) { ret = -ENOMEM; -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/18] scsi xcopy: suppress error messages
This patch suppresses error messages when copying between two arrays that support XCOPY each, but that cannot copy data between each other. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- drivers/scsi/sd.c | 12 1 file changed, 12 insertions(+) Index: linux-3.18-rc1/drivers/scsi/sd.c === --- linux-3.18-rc1.orig/drivers/scsi/sd.c 2014-10-21 00:48:51.0 +0200 +++ linux-3.18-rc1/drivers/scsi/sd.c2014-10-21 00:49:21.0 +0200 @@ -1935,6 +1935,18 @@ static int sd_done(struct scsi_cmnd *SCp req-cmd_flags |= REQ_QUIET; } } + } else if (sshdr.asc == 0x26) { + switch (op) { + /* +* Copying between two arrays that support XCOPY, but +* cannot access each other. +*/ + case EXTENDED_COPY: + good_bytes = 0; + req-__data_len = blk_rq_bytes(req); + req-cmd_flags |= REQ_QUIET; + break; + } } break; default: -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/18] scsi xcopy: keep cache of failures
If xcopy between two devices fails, it is pointless to send more xcopy command between there two devices because they take time and they will likely also fail. This patch keeps a cache of (source_device,destination_device) pairs where copying failed and makes sure that no xcopy command is sooner than 30 seconds after the last failure. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- drivers/scsi/sd.c | 35 +++ 1 file changed, 35 insertions(+) Index: linux-3.18-rc1/drivers/scsi/sd.c === --- linux-3.18-rc1.orig/drivers/scsi/sd.c 2014-10-21 00:49:21.0 +0200 +++ linux-3.18-rc1/drivers/scsi/sd.c2014-10-21 00:49:24.0 +0200 @@ -943,6 +943,26 @@ static void sd_config_copy(struct scsi_d (logical_block_size 9)); } +#define SD_COPY_DISABLED_CACHE_TIME(HZ * 30) +#define SD_COPY_DISABLED_CACHE_HASH_BITS 6 +#define SD_COPY_DISABLED_CACHE_HASH(1 SD_COPY_DISABLED_CACHE_HASH_BITS) + +struct sd_copy_disabled_cache_entry { + struct scsi_device *src; + struct scsi_device *dst; + unsigned long jiffies; +}; + +static struct sd_copy_disabled_cache_entry sd_copy_disabled_cache[SD_COPY_DISABLED_CACHE_HASH]; + +static struct sd_copy_disabled_cache_entry *sd_copy_disabled_cache_hash( + struct scsi_device *src, struct scsi_device *dst) +{ + return sd_copy_disabled_cache[ + hash_long((unsigned long)src + ((unsigned long)dst 1), SD_COPY_DISABLED_CACHE_HASH_BITS) + ]; +} + static int sd_setup_copy_cmnd(struct scsi_cmnd *cmd) { struct request *rq = cmd-request; @@ -955,6 +975,7 @@ static int sd_setup_copy_cmnd(struct scs struct bio *bio = rq-bio; struct page *page; unsigned char *buf; + struct sd_copy_disabled_cache_entry *e; dst_sdp = scsi_disk(rq-rq_disk)-device; dst_queue = rq-rq_disk-queue; @@ -974,6 +995,12 @@ static int sd_setup_copy_cmnd(struct scs if (src_sdp-sector_size != dst_sdp-sector_size) return BLKPREP_KILL; + /* The copy failed in the past, so do not retry it for some time */ + e = sd_copy_disabled_cache_hash(src_sdp, dst_sdp); + if (unlikely(jiffies - ACCESS_ONCE(e-jiffies) SD_COPY_DISABLED_CACHE_TIME) + likely(ACCESS_ONCE(e-src) == src_sdp) likely(ACCESS_ONCE(e-dst) == dst_sdp)) + return BLKPREP_KILL; + dst_lba = blk_rq_pos(rq) (ilog2(dst_sdp-sector_size) - 9); src_lba = bio-bi_copy-pair[0]-bi_iter.bi_sector (ilog2(src_sdp-sector_size) - 9); nr_blocks = blk_rq_sectors(rq) (ilog2(dst_sdp-sector_size) - 9); @@ -1937,11 +1964,19 @@ static int sd_done(struct scsi_cmnd *SCp } } else if (sshdr.asc == 0x26) { switch (op) { + struct sd_copy_disabled_cache_entry *e; + struct scsi_device *src_sdp, *dst_sdp; /* * Copying between two arrays that support XCOPY, but * cannot access each other. */ case EXTENDED_COPY: + dst_sdp = sdkp-device; + src_sdp = scsi_disk(req-bio-bi_copy-pair[0]-bi_bdev-bd_disk)-device; + e = sd_copy_disabled_cache_hash(src_sdp, dst_sdp); + ACCESS_ONCE(e-src) = src_sdp; + ACCESS_ONCE(e-dst) = dst_sdp; + ACCESS_ONCE(e-jiffies) = jiffies; good_bytes = 0; req-__data_len = blk_rq_bytes(req); req-cmd_flags |= REQ_QUIET; -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/18] dm: implement copy
This patch implements basic copy support for device mapper core. Individual targets can enable copy support by setting ti-copy_supported. Device mapper device advertises copy support if at least one target supports copy and for this target, at least one underlying device supports copy. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- drivers/md/dm-table.c |9 + drivers/md/dm.c | 42 +++--- include/linux/device-mapper.h |5 + 3 files changed, 53 insertions(+), 3 deletions(-) Index: linux-3.18-rc1/drivers/md/dm.c === --- linux-3.18-rc1.orig/drivers/md/dm.c 2014-10-21 00:46:30.0 +0200 +++ linux-3.18-rc1/drivers/md/dm.c 2014-10-21 00:49:09.0 +0200 @@ -1559,6 +1559,31 @@ static int __send_write_same(struct clon return __send_changing_extent_only(ci, get_num_write_same_bios, NULL); } +static int __send_copy(struct clone_info *ci) +{ + struct dm_target *ti; + sector_t bound; + + ti = dm_table_find_target(ci-map, ci-sector); + if (!dm_target_is_valid(ti)) + return -EIO; + + if (!ti-copy_supported) + return -EOPNOTSUPP; + + bound = max_io_len(ci-sector, ti); + + if (unlikely(ci-sector_count bound)) + return -EOPNOTSUPP; + + __clone_and_map_simple_bio(ci, ti, 0, NULL, NULL); + + ci-sector += ci-sector_count; + ci-sector_count = 0; + + return 0; +} + /* * Select the correct strategy for processing a non-flush bio. */ @@ -1572,6 +1597,8 @@ static int __split_and_process_non_flush return __send_discard(ci); else if (unlikely(bio-bi_rw REQ_WRITE_SAME)) return __send_write_same(ci); + else if (unlikely(bio-bi_rw REQ_COPY)) + return __send_copy(ci); ti = dm_table_find_target(ci-map, ci-sector); if (!dm_target_is_valid(ti)) @@ -1656,6 +1683,11 @@ static int dm_merge_bvec(struct request_ if (!dm_target_is_valid(ti)) goto out; + if (unlikely((bvm-bi_rw REQ_COPY) != 0)) { + if (!ti-copy_supported) + goto out_ret_max_size; + } + /* * Find maximum amount of I/O that won't need splitting */ @@ -1679,17 +1711,21 @@ static int dm_merge_bvec(struct request_ * entries. So always set max_size to 0, and the code below allows * just one page. */ - else if (queue_max_hw_sectors(q) = PAGE_SIZE 9) + else if (likely(!(bvm-bi_rw REQ_COPY)) +queue_max_hw_sectors(q) = PAGE_SIZE 9) max_size = 0; out: - dm_put_live_table_fast(md); /* * Always allow an entire first page */ - if (max_size = biovec-bv_len !(bvm-bi_size SECTOR_SHIFT)) + if (likely(!(bvm-bi_rw REQ_COPY)) + max_size = biovec-bv_len !(bvm-bi_size SECTOR_SHIFT)) max_size = biovec-bv_len; +out_ret_max_size: + dm_put_live_table_fast(md); + return max_size; } Index: linux-3.18-rc1/include/linux/device-mapper.h === --- linux-3.18-rc1.orig/include/linux/device-mapper.h 2014-10-21 00:42:40.0 +0200 +++ linux-3.18-rc1/include/linux/device-mapper.h2014-10-21 00:49:09.0 +0200 @@ -251,6 +251,11 @@ struct dm_target { * Set if this target does not return zeroes on discarded blocks. */ bool discard_zeroes_data_unsupported:1; + + /* +* Set if the target supports XCOPY. +*/ + bool copy_supported:1; }; /* Each target can link one of these into the table */ Index: linux-3.18-rc1/drivers/md/dm-table.c === --- linux-3.18-rc1.orig/drivers/md/dm-table.c 2014-10-21 00:34:45.0 +0200 +++ linux-3.18-rc1/drivers/md/dm-table.c2014-10-21 00:49:09.0 +0200 @@ -443,6 +443,11 @@ static int dm_set_device_limits(struct d q-limits.alignment_offset, (unsigned long long) start SECTOR_SHIFT); + if (ti-copy_supported) + limits-max_copy_sectors = + min_not_zero(limits-max_copy_sectors, + bdev_get_queue(bdev)-limits.max_copy_sectors); + /* * Check if merge fn is supported. * If not we'll force DM to use PAGE_SIZE or @@ -1264,6 +1269,10 @@ combine_limits: dm_device_name(table-md), (unsigned long long) ti-begin, (unsigned long long) ti-len); + + limits-max_copy_sectors = + min_not_zero(limits-max_copy_sectors, +
[PATCH 13/18] dm linear: support copy
Support copy operation in the linear target. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- drivers/md/dm-linear.c |1 + 1 file changed, 1 insertion(+) Index: linux-3.16-rc4/drivers/md/dm-linear.c === --- linux-3.16-rc4.orig/drivers/md/dm-linear.c 2014-07-11 22:20:27.0 +0200 +++ linux-3.16-rc4/drivers/md/dm-linear.c 2014-07-11 22:22:20.0 +0200 @@ -56,6 +56,7 @@ static int linear_ctr(struct dm_target * ti-num_flush_bios = 1; ti-num_discard_bios = 1; ti-num_write_same_bios = 1; + ti-copy_supported = 1; ti-private = lc; return 0; -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/18] dm stripe: support copy
Support the copy operation for the stripe target. In stripe_merge, we verify that the underlying device supports copy. If it doesn't, we can fail fast without any bio being contructed. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- drivers/md/dm-stripe.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) Index: linux-3.16-rc4/drivers/md/dm-stripe.c === --- linux-3.16-rc4.orig/drivers/md/dm-stripe.c 2014-07-11 22:20:25.0 +0200 +++ linux-3.16-rc4/drivers/md/dm-stripe.c 2014-07-11 22:23:54.0 +0200 @@ -165,6 +165,7 @@ static int stripe_ctr(struct dm_target * ti-num_flush_bios = stripes; ti-num_discard_bios = stripes; ti-num_write_same_bios = stripes; + ti-copy_supported = 1; sc-chunk_size = chunk_size; if (chunk_size (chunk_size - 1)) @@ -416,11 +417,19 @@ static int stripe_merge(struct dm_target struct stripe_c *sc = ti-private; sector_t bvm_sector = bvm-bi_sector; uint32_t stripe; + struct block_device *bdev; struct request_queue *q; stripe_map_sector(sc, bvm_sector, stripe, bvm_sector); - q = bdev_get_queue(sc-stripe[stripe].dev-bdev); + bdev = sc-stripe[stripe].dev-bdev; + + if (unlikely((bvm-bi_rw REQ_COPY) != 0)) { + if (!bdev_copy_offload(bdev)) + return 0; + } + + q = bdev_get_queue(bdev); if (!q-merge_bvec_fn) return max_size; -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/18] dm kcopyd: support copy offload
This patch adds copy offload support to dm-kcopyd. If copy offload fails, copying is performed using dm-io, just like before. There is a module parameter copy_offload that can be set to enable or disable this feature. It can be used to test performance of copy offload. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- drivers/md/dm-kcopyd.c | 38 +- 1 file changed, 37 insertions(+), 1 deletion(-) Index: linux-3.16-rc5/drivers/md/dm-kcopyd.c === --- linux-3.16-rc5.orig/drivers/md/dm-kcopyd.c 2014-07-15 16:18:15.0 +0200 +++ linux-3.16-rc5/drivers/md/dm-kcopyd.c 2014-07-15 19:20:34.0 +0200 @@ -96,6 +96,9 @@ static DEFINE_SPINLOCK(throttle_spinlock */ #define MAX_SLEEPS 10 +static bool copy_offload = true; +module_param(copy_offload, bool, S_IRUGO | S_IWUSR); + static void io_job_start(struct dm_kcopyd_throttle *t) { unsigned throttle, now, difference; @@ -358,6 +361,8 @@ struct kcopyd_job { sector_t progress; struct kcopyd_job *master_job; + + struct work_struct copy_work; }; static struct kmem_cache *_job_cache; @@ -709,6 +714,31 @@ static void submit_job(struct kcopyd_job } } +static void copy_offload_work(struct work_struct *work) +{ + struct kcopyd_job *job = container_of(work, struct kcopyd_job, copy_work); + sector_t copied; + + blkdev_issue_copy(job-source.bdev, job-source.sector, + job-dests[0].bdev, job-dests[0].sector, + job-source.count, + GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN, + NULL, NULL, copied); + + job-source.sector += copied; + job-source.count -= copied; + job-dests[0].sector += copied; + job-dests[0].count -= copied; + + submit_job(job); +} + +static void try_copy_offload(struct kcopyd_job *job) +{ + INIT_WORK(job-copy_work, copy_offload_work); + queue_work(job-kc-kcopyd_wq, job-copy_work); +} + int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from, unsigned int num_dests, struct dm_io_region *dests, unsigned int flags, dm_kcopyd_notify_fn fn, void *context) @@ -757,7 +787,13 @@ int dm_kcopyd_copy(struct dm_kcopyd_clie job-context = context; job-master_job = job; - submit_job(job); + if (copy_offload num_dests == 1 + bdev_copy_offload(job-source.bdev) + bdev_copy_offload(job-dests[0].bdev)) { + try_copy_offload(job); + } else { + submit_job(job); + } return 0; } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/18] dm kcopyd: introduce the function submit_job
We move some code to a function submit_job. It is needed for the next patch that calls submit_job from another place. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- drivers/md/dm-kcopyd.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) Index: linux-3.16-rc5/drivers/md/dm-kcopyd.c === --- linux-3.16-rc5.orig/drivers/md/dm-kcopyd.c 2014-07-14 16:45:23.0 +0200 +++ linux-3.16-rc5/drivers/md/dm-kcopyd.c 2014-07-14 17:28:36.0 +0200 @@ -698,6 +698,17 @@ static void split_job(struct kcopyd_job } } +static void submit_job(struct kcopyd_job *job) +{ + if (job-source.count = SUB_JOB_SIZE) + dispatch_job(job); + else { + mutex_init(job-lock); + job-progress = 0; + split_job(job); + } +} + int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from, unsigned int num_dests, struct dm_io_region *dests, unsigned int flags, dm_kcopyd_notify_fn fn, void *context) @@ -746,13 +757,7 @@ int dm_kcopyd_copy(struct dm_kcopyd_clie job-context = context; job-master_job = job; - if (job-source.count = SUB_JOB_SIZE) - dispatch_job(job); - else { - mutex_init(job-lock); - job-progress = 0; - split_job(job); - } + submit_job(job); return 0; } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/18] dm kcopyd: change mutex to spinlock
job-lock is only taken for a finite amount of time and the process doesn't block while holding it, so change it from mutex to spinlock. This change is needed for the next patch that makes it possible to call segment_complete from an interrupt. Taking mutexes inside an interrupt is not allowed. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- drivers/md/dm-kcopyd.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) Index: linux-3.16-rc5/drivers/md/dm-kcopyd.c === --- linux-3.16-rc5.orig/drivers/md/dm-kcopyd.c 2014-07-15 19:20:34.0 +0200 +++ linux-3.16-rc5/drivers/md/dm-kcopyd.c 2014-07-15 19:24:20.0 +0200 @@ -21,7 +21,7 @@ #include linux/slab.h #include linux/vmalloc.h #include linux/workqueue.h -#include linux/mutex.h +#include linux/spinlock.h #include linux/delay.h #include linux/device-mapper.h #include linux/dm-kcopyd.h @@ -356,7 +356,7 @@ struct kcopyd_job { * These fields are only used if the job has been split * into more manageable parts. */ - struct mutex lock; + spinlock_t lock; atomic_t sub_jobs; sector_t progress; @@ -629,7 +629,7 @@ static void segment_complete(int read_er struct kcopyd_job *job = sub_job-master_job; struct dm_kcopyd_client *kc = job-kc; - mutex_lock(job-lock); + spin_lock(job-lock); /* update the error */ if (read_err) @@ -653,7 +653,7 @@ static void segment_complete(int read_er job-progress += count; } } - mutex_unlock(job-lock); + spin_unlock(job-lock); if (count) { int i; @@ -708,7 +708,7 @@ static void submit_job(struct kcopyd_job if (job-source.count = SUB_JOB_SIZE) dispatch_job(job); else { - mutex_init(job-lock); + spin_lock_init(job-lock); job-progress = 0; split_job(job); } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 18/18] dm kcopyd: call copy offload with asynchronous callback
Change dm kcopyd so that it calls blkdev_issue_copy with an asynchronous callback. There can be large number of pending kcopyd requests and holding a process context for each of them may put too much load on the workqueue subsystem. This patch changes it so that blkdev_issue_copy returns after it submitted the requests and copy_offload_callback is called when the copy operation finishes. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- drivers/md/dm-kcopyd.c | 33 ++--- 1 file changed, 14 insertions(+), 19 deletions(-) Index: linux-3.16-rc5/drivers/md/dm-kcopyd.c === --- linux-3.16-rc5.orig/drivers/md/dm-kcopyd.c 2014-07-15 19:24:20.0 +0200 +++ linux-3.16-rc5/drivers/md/dm-kcopyd.c 2014-07-15 19:24:54.0 +0200 @@ -361,8 +361,6 @@ struct kcopyd_job { sector_t progress; struct kcopyd_job *master_job; - - struct work_struct copy_work; }; static struct kmem_cache *_job_cache; @@ -628,8 +626,9 @@ static void segment_complete(int read_er struct kcopyd_job *sub_job = (struct kcopyd_job *) context; struct kcopyd_job *job = sub_job-master_job; struct dm_kcopyd_client *kc = job-kc; + unsigned long flags; - spin_lock(job-lock); + spin_lock_irqsave(job-lock, flags); /* update the error */ if (read_err) @@ -653,7 +652,7 @@ static void segment_complete(int read_er job-progress += count; } } - spin_unlock(job-lock); + spin_unlock_irqrestore(job-lock, flags); if (count) { int i; @@ -714,29 +713,25 @@ static void submit_job(struct kcopyd_job } } -static void copy_offload_work(struct work_struct *work) +static void copy_offload_callback(void *ptr, int error) { - struct kcopyd_job *job = container_of(work, struct kcopyd_job, copy_work); - sector_t copied; + struct kcopyd_job *job = ptr; - blkdev_issue_copy(job-source.bdev, job-source.sector, - job-dests[0].bdev, job-dests[0].sector, - job-source.count, - GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN, - NULL, NULL, copied); - - job-source.sector += copied; - job-source.count -= copied; - job-dests[0].sector += copied; - job-dests[0].count -= copied; + job-source.sector += job-progress; + job-source.count -= job-progress; + job-dests[0].sector += job-progress; + job-dests[0].count -= job-progress; submit_job(job); } static void try_copy_offload(struct kcopyd_job *job) { - INIT_WORK(job-copy_work, copy_offload_work); - queue_work(job-kc-kcopyd_wq, job-copy_work); + blkdev_issue_copy(job-source.bdev, job-source.sector, + job-dests[0].bdev, job-dests[0].sector, + job-source.count, + GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN, + copy_offload_callback, job, job-progress); } int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from, -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND
On 10/22/2014 01:30 AM, Jan Kara wrote: When sg_scsi_ioctl() fails to prepare request to submit in blk_rq_map_kern() we jump to a label where we just end up copying (luckily zeroed-out) kernel buffer to userspace instead of reporting error. Fix the problem by jumping to the right label. Thanks Jan, applied. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/18] block copy: initial XCOPY offload support
See below ... On 14-10-22 03:26 PM, Mikulas Patocka wrote: This is Martin Petersen's xcopy patch (https://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/commit/?h=xcopyid=0bdeed274e16b3038a851552188512071974eea8) with some bug fixes, ported to the current kernel. This patch makes it possible to use the SCSI XCOPY command. We create a bio that has REQ_COPY flag in bi_rw and a bi_copy structure that defines the source device. The target device is defined in the bi_bdev and bi_iter.bi_sector. There is a new BLKCOPY ioctl that makes it possible to use XCOPY from userspace. The ioctl argument is a pointer to an array of four uint64_t values. The first value is a source byte offset, the second value is a destination byte offset, the third value is byte length. The forth value is written by the kernel and it represents the number of bytes that the kernel actually copied. Signed-off-by: Martin K. Petersen martin.peter...@oracle.com Signed-off-by: Mikulas Patocka mpato...@redhat.com --- Documentation/ABI/testing/sysfs-block |9 + block/bio.c |2 block/blk-core.c |5 block/blk-lib.c | 95 block/blk-merge.c |7 block/blk-settings.c | 13 + block/blk-sysfs.c | 10 + block/compat_ioctl.c |1 block/ioctl.c | 49 ++ drivers/scsi/scsi.c | 57 +++ drivers/scsi/sd.c | 269 +- drivers/scsi/sd.h |4 include/linux/bio.h |9 - include/linux/blk_types.h | 15 + include/linux/blkdev.h| 15 + include/scsi/scsi_device.h|3 include/uapi/linux/fs.h |1 17 files changed, 551 insertions(+), 13 deletions(-) Index: linux-3.18-rc1/Documentation/ABI/testing/sysfs-block === --- linux-3.18-rc1.orig/Documentation/ABI/testing/sysfs-block 2014-10-21 01:04:13.0 +0200 +++ linux-3.18-rc1/Documentation/ABI/testing/sysfs-block2014-10-21 01:04:22.0 +0200 @@ -228,3 +228,12 @@ Description: write_same_max_bytes is 0, write same is not supported by the device. + +What: /sys/block/disk/queue/copy_max_bytes +Date: January 2014 +Contact: Martin K. Petersen martin.peter...@oracle.com +Description: + Devices that support copy offloading will set this value + to indicate the maximum buffer size in bytes that can be + copied in one operation. If the copy_max_bytes is 0 the + device does not support copy offload. Index: linux-3.18-rc1/block/blk-core.c === --- linux-3.18-rc1.orig/block/blk-core.c2014-10-21 01:04:14.0 +0200 +++ linux-3.18-rc1/block/blk-core.c 2014-10-21 01:04:22.0 +0200 @@ -1828,6 +1828,11 @@ generic_make_request_checks(struct bio * goto end_io; } + if (bio-bi_rw REQ_COPY !bdev_copy_offload(bio-bi_bdev)) { + err = -EOPNOTSUPP; + goto end_io; + } + /* * Various block parts want %current-io_context and lazy ioc * allocation ends up trading a lot of pain for a small amount of Index: linux-3.18-rc1/block/blk-lib.c === --- linux-3.18-rc1.orig/block/blk-lib.c 2014-10-21 01:04:14.0 +0200 +++ linux-3.18-rc1/block/blk-lib.c 2014-10-21 01:04:22.0 +0200 @@ -304,3 +304,98 @@ int blkdev_issue_zeroout(struct block_de return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); } EXPORT_SYMBOL(blkdev_issue_zeroout); + +/** + * blkdev_issue_copy - queue a copy same operation + * @src_bdev: source blockdev + * @src_sector:source sector + * @dst_bdev: destination blockdev + * @dst_sector: destination sector + * @nr_sects: number of sectors to copy + * @gfp_mask: memory allocation flags (for bio_alloc) + * + * Description: + *Copy a block range from source device to target device. + */ +int blkdev_issue_copy(struct block_device *src_bdev, sector_t src_sector, + struct block_device *dst_bdev, sector_t dst_sector, + unsigned int nr_sects, gfp_t gfp_mask) +{ + DECLARE_COMPLETION_ONSTACK(wait); + struct request_queue *sq = bdev_get_queue(src_bdev); + struct request_queue *dq = bdev_get_queue(dst_bdev); + unsigned int max_copy_sectors; + struct bio_batch bb; + int ret = 0; + + if (!sq || !dq) + return -ENXIO; + + max_copy_sectors = min(sq-limits.max_copy_sectors, + dq-limits.max_copy_sectors); + + if
Re: [PATCH 1/1] SCSI-QLA2...: Deletion of unnecessary checks before the function call vfree
If you are convinced that dropping the null tests is a good idea, then you can submit the patch that makes the change to the relevant maintainers and mailing lists. I resent the request once more because another Triple-X software development adventure might follow ...? Regards, Markus From ff44962f88ac2dae9324e30819629da4fb33f0ff Mon Sep 17 00:00:00 2001 From: Markus Elfring elfr...@users.sourceforge.net Date: Wed, 22 Oct 2014 20:40:31 +0200 Subject: [PATCH] SCSI-QLA2XXX: Deletion of unnecessary checks before the function call vfree A semantic patch approach was proposed with the subject [PATCH with Coccinelle?] Deletion of unnecessary checks before specific function calls on 2014-03-05. https://lkml.org/lkml/2014/3/5/344 http://article.gmane.org/gmane.comp.version-control.coccinelle/3513/ This patch pattern application was repeated with the help of the software Coccinelle 1.0.0-rc22 on the source files for Linux 3.17.1. An extract of the automatically generated update suggestions is shown here. It was determined that the affected source code places call functions which perform input parameter validation already. It is therefore not needed that a similar safety check is repeated at the call site. Signed-off-by: Markus Elfring elfr...@users.sourceforge.net --- drivers/scsi/qla2xxx/qla_attr.c | 6 ++ drivers/scsi/qla2xxx/qla_init.c | 18 ++ drivers/scsi/qla2xxx/qla_os.c | 6 ++ 3 files changed, 10 insertions(+), 20 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_attr.c b/drivers/scsi/qla2xxx/qla_attr.c index 82b92c4..95c4c09 100644 --- a/drivers/scsi/qla2xxx/qla_attr.c +++ b/drivers/scsi/qla2xxx/qla_attr.c @@ -175,10 +175,8 @@ qla2x00_sysfs_write_fw_dump_template(struct file *filp, struct kobject *kobj, uint32_t size; if (off == 0) { - if (ha-fw_dump) - vfree(ha-fw_dump); - if (ha-fw_dump_template) - vfree(ha-fw_dump_template); + vfree(ha-fw_dump); + vfree(ha-fw_dump_template); ha-fw_dump = NULL; ha-fw_dump_len = 0; diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c index a4dde7e..8da3d4f 100644 --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -5256,8 +5256,7 @@ qla24xx_load_risc_flash(scsi_qla_host_t *vha, uint32_t *srisc_addr, if (!IS_QLA27XX(ha)) return rval; - if (ha-fw_dump_template) - vfree(ha-fw_dump_template); + vfree(ha-fw_dump_template); ha-fw_dump_template = NULL; ha-fw_dump_template_len = 0; @@ -5307,8 +5306,7 @@ qla24xx_load_risc_flash(scsi_qla_host_t *vha, uint32_t *srisc_addr, default_template: ql_log(ql_log_warn, vha, 0x0168, Using default fwdump template\n); - if (ha-fw_dump_template) - vfree(ha-fw_dump_template); + vfree(ha-fw_dump_template); ha-fw_dump_template = NULL; ha-fw_dump_template_len = 0; @@ -5342,8 +5340,7 @@ default_template: failed_template: ql_log(ql_log_warn, vha, 0x016d, Failed default fwdump template\n); - if (ha-fw_dump_template) - vfree(ha-fw_dump_template); + vfree(ha-fw_dump_template); ha-fw_dump_template = NULL; ha-fw_dump_template_len = 0; return rval; @@ -5559,8 +5556,7 @@ qla24xx_load_risc_blob(scsi_qla_host_t *vha, uint32_t *srisc_addr) if (!IS_QLA27XX(ha)) return rval; - if (ha-fw_dump_template) - vfree(ha-fw_dump_template); + vfree(ha-fw_dump_template); ha-fw_dump_template = NULL; ha-fw_dump_template_len = 0; @@ -5609,8 +5605,7 @@ qla24xx_load_risc_blob(scsi_qla_host_t *vha, uint32_t *srisc_addr) default_template: ql_log(ql_log_warn, vha, 0x0178, Using default fwdump template\n); - if (ha-fw_dump_template) - vfree(ha-fw_dump_template); + vfree(ha-fw_dump_template); ha-fw_dump_template = NULL; ha-fw_dump_template_len = 0; @@ -5644,8 +5639,7 @@ default_template: failed_template: ql_log(ql_log_warn, vha, 0x017d, Failed default fwdump template\n); - if (ha-fw_dump_template) - vfree(ha-fw_dump_template); + vfree(ha-fw_dump_template); ha-fw_dump_template = NULL; ha-fw_dump_template_len = 0; return rval; diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index db3dbd9..0f9c378 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -3676,10 +3676,8 @@ qla2x00_free_fw_dump(struct qla_hw_data *ha) dma_free_coherent(ha-pdev-dev, EFT_SIZE, ha-eft, ha-eft_dma); - if (ha-fw_dump) - vfree(ha-fw_dump); - if (ha-fw_dump_template) - vfree(ha-fw_dump_template); + vfree(ha-fw_dump); + vfree(ha-fw_dump_template); ha-fce = NULL;
Re: [PATCH 0/5] Feature enhancements for ses module
On 08/25/2014 11:34 AM, Song Liu wrote: From: Song Liu [mailto:songliubrav...@fb.com] Sent: Monday, August 25, 2014 10:26 AM To: Song Liu Subject: [PATCH 0/5] Feature enhancements for ses module These patches include a few enhancements to ses module: 1: close potential race condition by at enclosure race condition 2,3,4: add enclosure id and device slot, so we can create symlink in /dev/disk/by-slot: # ls -d /dev/disk/by-slot/* /dev/disk/by-slot/enclosure-0x5000ae41fc1310ff-slot0 5: add ability to power on/off device with power_status file in sysfs. Dan Williams (4): SES: close potential registration race SES: generate KOBJ_CHANGE on enclosure attach SES: add enclosure logical id SES: add reliable slot attribute Song Liu (1): SES: Add power_status to SES enclosure component Guys, where are we with these? Seemed like they got reviewed (and updates/fixes posted), then nothing else happened. Would be nice to get these upstream, so we don't have to carry them at FB. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 10/12] IB/srp: Use block layer tags
-Original Message- From: Bart Van Assche [mailto:bvanass...@acm.org] Sent: Tuesday, 07 October, 2014 8:07 AM ... @@ -1927,7 +1931,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd) cmd-opcode = SRP_CMD; cmd-lun= cpu_to_be64((u64) scmnd-device-lun 48); - cmd-tag= req-index; + cmd-tag= tag; memcpy(cmd-cdb, scmnd-cmnd, scmnd-cmd_len); req-scmnd= scmnd; ... +static int srp_slave_alloc(struct scsi_device *sdev) +{ + sdev-tagged_supported = 1; + + scsi_activate_tcq(sdev, sdev-queue_depth); + + return 0; +} + Have you tested this with scsi_mod.use_blk_mq=n? Trying similar changes in hpsa, we still receive some INQUIRY commands submitted through queuecommand with tag -1. They are for devices for which slave_alloc has not yet been run, implying this work needs to be done even earlier. Maybe the midlayer is missing a slave_alloc call somewhere? --- Rob ElliottHP Server Storage N�r��yb�X��ǧv�^�){.n�+{{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
Re: [PATCH 0/5] Feature enhancements for ses module
On 14-10-22 09:12 PM, Jens Axboe wrote: On 08/25/2014 11:34 AM, Song Liu wrote: From: Song Liu [mailto:songliubrav...@fb.com] Sent: Monday, August 25, 2014 10:26 AM To: Song Liu Subject: [PATCH 0/5] Feature enhancements for ses module These patches include a few enhancements to ses module: 1: close potential race condition by at enclosure race condition 2,3,4: add enclosure id and device slot, so we can create symlink in /dev/disk/by-slot: # ls -d /dev/disk/by-slot/* /dev/disk/by-slot/enclosure-0x5000ae41fc1310ff-slot0 5: add ability to power on/off device with power_status file in sysfs. Had a rude awakening with sg_ses recently when setting a field in the enclosure control dpage. That is what is being done in point 5: above. The time honoured technique is to read the corresponding enclosure status dpage, find the correct element, twiddle the field of interest, set the SELECT bit and write it back. The idea is maintain any other field settings in that element. And this is what the ses module does. There is at least one SES device out there that rejects the write if there are bits set in RESERVED locations. According to SPC-4 a device may do that. Look at the status (read) and control (write) variants of each element type: many times the control variant has less fields. To fix that the read-modify-write cycle needs to be changed to a read-mask-modify-write cycle where the mask is the allowable write mask (there would be one for each element type). This is ugly and may create problems in the future if and when T10 adds a new field in an element. About that time T10 should think about refining the meaning of RESERVED in SES to outlaw SES devices creating this particular time waster. Doug Gilbert Dan Williams (4): SES: close potential registration race SES: generate KOBJ_CHANGE on enclosure attach SES: add enclosure logical id SES: add reliable slot attribute Song Liu (1): SES: Add power_status to SES enclosure component Guys, where are we with these? Seemed like they got reviewed (and updates/fixes posted), then nothing else happened. Would be nice to get these upstream, so we don't have to carry them at FB. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 6/6] scsi: use dev_printk variants where possible
-Original Message- From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi- ow...@vger.kernel.org] On Behalf Of Hannes Reinecke Sent: Tuesday, 03 June, 2014 6:23 AM To: James Bottomley Cc: Christoph Hellwig; linux-scsi@vger.kernel.org; Hannes Reinecke Subject: [PATCH 6/6] scsi: use dev_printk variants where possible Using dev_printk variants prefixes the logging message with the originating device, which makes debugging easier. Signed-off-by: Hannes Reinecke h...@suse.de ... This patch (91921e016a2199e7afe5933c94bd9f723d946598 in the kernel) has left the devname[64] variable in this function unused. There is an sprintf setting it, but it is never read. char devname[64]; ... sprintf(devname, host %d channel %d id %d, shost-host_no, sdev-channel, sdev-id); Here are the former uses: diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index e02b3aa..46563b1 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c ... @@ -1430,17 +1433,19 @@ static int scsi_report_lun_scan(struct scsi_target *starget, int bflags, * a retry. */ for (retries = 0; retries 3; retries++) { - SCSI_LOG_SCAN_BUS(3, printk (KERN_INFO scsi scan: Sending - REPORT LUNS to %s (try %d)\n, devname, + SCSI_LOG_SCAN_BUS(3, sdev_printk (KERN_INFO, sdev, + scsi scan: Sending REPORT LUNS to (try %d)\n, retries)); The word to remains from: to %s, devname but the switch to sdev_printk probably meant to delete to too. ... @@ -1466,10 +1471,11 @@ static int scsi_report_lun_scan(struct scsi_target *starget, int bflags, num_luns = (length / sizeof(struct scsi_lun)); if (num_luns max_scsi_report_luns) { - printk(KERN_WARNING scsi: On %s only %d (max_scsi_report_luns) - of %d luns reported, try increasing - max_scsi_report_luns.\n, devname, -max_scsi_report_luns, num_luns); + sdev_printk(KERN_WARNING, sdev, + Only %d (max_scsi_report_luns) + of %d luns reported, try increasing + max_scsi_report_luns.\n, + max_scsi_report_luns, num_luns); num_luns = max_scsi_report_luns; } @@ -1495,15 +1501,15 @@ static int scsi_report_lun_scan(struct scsi_target *starget, int bflags, * this differs from what linux would print for the * integer LUN value. */ - printk(KERN_WARNING scsi: %s lun 0x, devname); - data = (char *)lunp-scsi_lun; - for (i = 0; i sizeof(struct scsi_lun); i++) - printk(%02x, data[i]); - printk( has a LUN larger than currently supported.\n); + sdev_printk(KERN_WARNING, sdev, + lun 0x%8phN has a LUN larger + than currently supported.\n, + lunp-scsi_lun); } else if (lun sdev-host-max_lun) { - printk(KERN_WARNING scsi: %s lun%d has a LUN larger - than allowed by the host adapter\n, -devname, lun); + sdev_printk(KERN_WARNING, sdev, + lun 0x%8phN has a LUN larger + than allowed by the host adapter\n, + lunp-scsi_lun); } else { int res; ... -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/5] Feature enhancements for ses module
Hi Doug, I am not sure whether I fully understand the scenario. Actually patch 5 tries to clear all reserved bits: + dest_desc[0] = 0; + /* only clear byte 1 for ENCLOSURE_COMPONENT_DEVICE */ + if (ecomp-type == ENCLOSURE_COMPONENT_DEVICE) + dest_desc[1] = 0; + dest_desc[2] = 0xde; + dest_desc[3] = 0x3c; Would this work for device that rejects request with 1 in RESERVED areas? Thanks, Song -Original Message- From: Douglas Gilbert [mailto:dgilb...@interlog.com] Sent: Wednesday, October 22, 2014 3:29 PM To: Jens Axboe; Song Liu; linux-scsi@vger.kernel.org Cc: Hannes Reinecke; Dan Williams; Christoph Hellwig Subject: Re: [PATCH 0/5] Feature enhancements for ses module On 14-10-22 09:12 PM, Jens Axboe wrote: On 08/25/2014 11:34 AM, Song Liu wrote: From: Song Liu [mailto:songliubrav...@fb.com] Sent: Monday, August 25, 2014 10:26 AM To: Song Liu Subject: [PATCH 0/5] Feature enhancements for ses module These patches include a few enhancements to ses module: 1: close potential race condition by at enclosure race condition 2,3,4: add enclosure id and device slot, so we can create symlink in /dev/disk/by-slot: # ls -d /dev/disk/by-slot/* /dev/disk/by-slot/enclosure-0x5000ae41fc1310ff-slot0 5: add ability to power on/off device with power_status file in sysfs. Had a rude awakening with sg_ses recently when setting a field in the enclosure control dpage. That is what is being done in point 5: above. The time honoured technique is to read the corresponding enclosure status dpage, find the correct element, twiddle the field of interest, set the SELECT bit and write it back. The idea is maintain any other field settings in that element. And this is what the ses module does. There is at least one SES device out there that rejects the write if there are bits set in RESERVED locations. According to SPC-4 a device may do that. Look at the status (read) and control (write) variants of each element type: many times the control variant has less fields. To fix that the read-modify-write cycle needs to be changed to a read-mask- modify-write cycle where the mask is the allowable write mask (there would be one for each element type). This is ugly and may create problems in the future if and when T10 adds a new field in an element. About that time T10 should think about refining the meaning of RESERVED in SES to outlaw SES devices creating this particular time waster. Doug Gilbert Dan Williams (4): SES: close potential registration race SES: generate KOBJ_CHANGE on enclosure attach SES: add enclosure logical id SES: add reliable slot attribute Song Liu (1): SES: Add power_status to SES enclosure component Guys, where are we with these? Seemed like they got reviewed (and updates/fixes posted), then nothing else happened. Would be nice to get these upstream, so we don't have to carry them at FB. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Feature enhancements for ses module
On 14-10-23 01:01 AM, Song Liu wrote: Hi Doug, I am not sure whether I fully understand the scenario. Actually patch 5 tries to clear all reserved bits: + dest_desc[0] = 0; + /* only clear byte 1 for ENCLOSURE_COMPONENT_DEVICE */ + if (ecomp-type == ENCLOSURE_COMPONENT_DEVICE) + dest_desc[1] = 0; + dest_desc[2] = 0xde; + dest_desc[3] = 0x3c; Would this work for device that rejects request with 1 in RESERVED areas? That is a pretty asymmetric element type, assuming we are talking about the enclosure control and enclosure status elements (i.e. etc=0xe). My guess would be: dest_desc[0] = 0x80 | (src_desc[0] 40); dest_desc[1] = 0x80 src_desc[1]; dest_desc[2] = (pc_req 6) | pc_delay; dest_desc[3] = 0xff src_desc[3]; or if you have a new power_off_duration: dest_desc[3] = (power_off_duration 2) | (src_desc[3] 0x3); In byte 0 the top bit (SELECT) must be set or the enclosure will ignore any other settings in that element. If the PRDFAIL bit is already set, then that setting will be maintained. SES-3 has a note about clearing DISABLE and SWAP. In byte 1 is if the identifier (LED ?) is active (saying blinking) prior to this power cycle request, then it will stay blinking until the power drops. If the enclosure was really clever it might keep blinking after the power cycle :-) Also notice that the requested power cycle can be cancelled up to the time until power cycle drops to zero. -Original Message- From: Douglas Gilbert [mailto:dgilb...@interlog.com] Sent: Wednesday, October 22, 2014 3:29 PM To: Jens Axboe; Song Liu; linux-scsi@vger.kernel.org Cc: Hannes Reinecke; Dan Williams; Christoph Hellwig Subject: Re: [PATCH 0/5] Feature enhancements for ses module On 14-10-22 09:12 PM, Jens Axboe wrote: On 08/25/2014 11:34 AM, Song Liu wrote: From: Song Liu [mailto:songliubrav...@fb.com] Sent: Monday, August 25, 2014 10:26 AM To: Song Liu Subject: [PATCH 0/5] Feature enhancements for ses module These patches include a few enhancements to ses module: 1: close potential race condition by at enclosure race condition 2,3,4: add enclosure id and device slot, so we can create symlink in /dev/disk/by-slot: # ls -d /dev/disk/by-slot/* /dev/disk/by-slot/enclosure-0x5000ae41fc1310ff-slot0 5: add ability to power on/off device with power_status file in sysfs. Had a rude awakening with sg_ses recently when setting a field in the enclosure control dpage. That is what is being done in point 5: above. The time honoured technique is to read the corresponding enclosure status dpage, find the correct element, twiddle the field of interest, set the SELECT bit and write it back. The idea is maintain any other field settings in that element. And this is what the ses module does. There is at least one SES device out there that rejects the write if there are bits set in RESERVED locations. According to SPC-4 a device may do that. Look at the status (read) and control (write) variants of each element type: many times the control variant has less fields. To fix that the read-modify-write cycle needs to be changed to a read-mask- modify-write cycle where the mask is the allowable write mask (there would be one for each element type). This is ugly and may create problems in the future if and when T10 adds a new field in an element. About that time T10 should think about refining the meaning of RESERVED in SES to outlaw SES devices creating this particular time waster. Doug Gilbert Dan Williams (4): SES: close potential registration race SES: generate KOBJ_CHANGE on enclosure attach SES: add enclosure logical id SES: add reliable slot attribute Song Liu (1): SES: Add power_status to SES enclosure component Guys, where are we with these? Seemed like they got reviewed (and updates/fixes posted), then nothing else happened. Would be nice to get these upstream, so we don't have to carry them at FB. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] scsi: add try_rc16 blacklist flag
Sitsofe == Sitsofe Wheeler sits...@gmail.com writes: Last time around we identified this as a problem with Microsoft's interpretation of the T10 SBC spec. And they promised that they are going to fix that. Sitsofe OK but if we were happy to wait for Microsoft to fix the Sitsofe problem on the host why were the (broken and incomplete) Sitsofe BLIST_SKIP_VPD_PAGES patches committed to 3.17 rather than Sitsofe withdrawn? What's going to be done about those patches now? There are two orthogonal problems. One being that the driver advertised conformance to an old SCSI spec. That's being addressed with the separate SPC-3 patch. The other issue is that thin provisioning is being incorrectly advertised. Because that's being addressed by Microsoft and is an isolated use case I'm hesitant to add quirk for it. Whereas I know several other devices that will benefit from the TRY_VPD_PAGES blacklist option. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] scsi: Add Hyper-V logical block provisioning quirks
Sitsofe == Sitsofe Wheeler sits...@gmail.com writes: Sitsofe 2. On top of the above, when a disk is small (has less than Sitsofe2^32 sectors which is typically 2 TBytes in size) READ SitsofeCAPACITY(16) won't be triggered. static int sd_try_rc16_first(struct scsi_device *sdp) { if (sdp-host-max_cmd_len 16) return 0; if (sdp-try_rc_10_first) return 0; if (sdp-scsi_level SCSI_SPC_2) return 1; if (scsi_device_protection(sdp)) return 1; return 0; } -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/5] Feature enhancements for ses module
Hi Doug, The power on/off field together with fault, locate, and active are for HDD (i.e. DeviceSlot and ArrayDeviceSlot). They are not for enclosure or other elements. So we are not dealing with power off duration, etc. here. Thanks, Song -Original Message- From: Douglas Gilbert [mailto:dgilb...@interlog.com] Sent: Wednesday, October 22, 2014 6:17 PM To: Song Liu; Jens Axboe; linux-scsi@vger.kernel.org Cc: Hannes Reinecke; Dan Williams; Christoph Hellwig Subject: Re: [PATCH 0/5] Feature enhancements for ses module On 14-10-23 01:01 AM, Song Liu wrote: Hi Doug, I am not sure whether I fully understand the scenario. Actually patch 5 tries to clear all reserved bits: + dest_desc[0] = 0; + /* only clear byte 1 for ENCLOSURE_COMPONENT_DEVICE */ if + (ecomp-type == ENCLOSURE_COMPONENT_DEVICE) + dest_desc[1] = 0; + dest_desc[2] = 0xde; + dest_desc[3] = 0x3c; Would this work for device that rejects request with 1 in RESERVED areas? That is a pretty asymmetric element type, assuming we are talking about the enclosure control and enclosure status elements (i.e. etc=0xe). My guess would be: dest_desc[0] = 0x80 | (src_desc[0] 40); dest_desc[1] = 0x80 src_desc[1]; dest_desc[2] = (pc_req 6) | pc_delay; dest_desc[3] = 0xff src_desc[3]; or if you have a new power_off_duration: dest_desc[3] = (power_off_duration 2) | (src_desc[3] 0x3); In byte 0 the top bit (SELECT) must be set or the enclosure will ignore any other settings in that element. If the PRDFAIL bit is already set, then that setting will be maintained. SES-3 has a note about clearing DISABLE and SWAP. In byte 1 is if the identifier (LED ?) is active (saying blinking) prior to this power cycle request, then it will stay blinking until the power drops. If the enclosure was really clever it might keep blinking after the power cycle :-) Also notice that the requested power cycle can be cancelled up to the time until power cycle drops to zero. -Original Message- From: Douglas Gilbert [mailto:dgilb...@interlog.com] Sent: Wednesday, October 22, 2014 3:29 PM To: Jens Axboe; Song Liu; linux-scsi@vger.kernel.org Cc: Hannes Reinecke; Dan Williams; Christoph Hellwig Subject: Re: [PATCH 0/5] Feature enhancements for ses module On 14-10-22 09:12 PM, Jens Axboe wrote: On 08/25/2014 11:34 AM, Song Liu wrote: From: Song Liu [mailto:songliubrav...@fb.com] Sent: Monday, August 25, 2014 10:26 AM To: Song Liu Subject: [PATCH 0/5] Feature enhancements for ses module These patches include a few enhancements to ses module: 1: close potential race condition by at enclosure race condition 2,3,4: add enclosure id and device slot, so we can create symlink in /dev/disk/by-slot: # ls -d /dev/disk/by-slot/* /dev/disk/by-slot/enclosure-0x5000ae41fc1310ff-slot0 5: add ability to power on/off device with power_status file in sysfs. Had a rude awakening with sg_ses recently when setting a field in the enclosure control dpage. That is what is being done in point 5: above. The time honoured technique is to read the corresponding enclosure status dpage, find the correct element, twiddle the field of interest, set the SELECT bit and write it back. The idea is maintain any other field settings in that element. And this is what the ses module does. There is at least one SES device out there that rejects the write if there are bits set in RESERVED locations. According to SPC-4 a device may do that. Look at the status (read) and control (write) variants of each element type: many times the control variant has less fields. To fix that the read-modify-write cycle needs to be changed to a read-mask- modify-write cycle where the mask is the allowable write mask (there would be one for each element type). This is ugly and may create problems in the future if and when T10 adds a new field in an element. About that time T10 should think about refining the meaning of RESERVED in SES to outlaw SES devices creating this particular time waster. Doug Gilbert Dan Williams (4): SES: close potential registration race SES: generate KOBJ_CHANGE on enclosure attach SES: add enclosure logical id SES: add reliable slot attribute Song Liu (1): SES: Add power_status to SES enclosure component Guys, where are we with these? Seemed like they got reviewed (and updates/fixes posted), then nothing else happened. Would be nice to get these upstream, so we don't have to carry them at FB. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at https://urldefense.proofpoint.com/v1/url?u=http://vger.kernel.org/majo rdomo- info.htmlk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=J3jGe56dIPfS5TN6DM