Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default
On Fri, 18 Sep 2009 03:01:42 am Christoph Hellwig wrote: Err, I'll take this one back for now pending some more discussion. What we need more urgently is the writeback cache flag, which is now implemented in qemu, patch following ASAP. OK, still catching up on mail. I'll push them out of the queue for now. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default
Err, I'll take this one back for now pending some more discussion. What we need more urgently is the writeback cache flag, which is now implemented in qemu, patch following ASAP. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default
On 08/28/2009 04:15 AM, Rusty Russell wrote: On Thu, 27 Aug 2009 08:34:19 pm Avi Kivity wrote: There are two possible semantics to cache=writeback: - simulate a drive with a huge write cache; use fsync() to implement barriers - tell the host that we aren't interested in data integrity, lie to the guest to get best performance Why lie to the guest? Just say we're not ordered, and don't support barriers. Gets even *better* performance since it won't drain the queues. In that case, honesty is preferable. It means testing with cache=writeback exercises different guest code paths, but that's acceptable. Maybe you're thinking of full virtualization where we guest ignorance is bliss. But lying always gets us in trouble later on when other cases come up. The second semantic is not useful for production, but is very useful for testing out things where you aren't worries about host crashes and you're usually rebooting the guest very often (you can't rely on guest caches, so you want the host to cache). This is not the ideal world; people will do things for performance in production. We found that cache=none is faster than cache=writeback when you're really interested in performance (no qcow2). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default
On Wed, 26 Aug 2009 09:58:13 pm Avi Kivity wrote: On 08/26/2009 03:06 PM, Rusty Russell wrote: On Tue, 25 Aug 2009 11:46:08 pm Christoph Hellwig wrote: On Tue, Aug 25, 2009 at 11:41:37PM +0930, Rusty Russell wrote: On Fri, 21 Aug 2009 06:26:16 am Christoph Hellwig wrote: Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which means it does not allow filesystems to use barriers. But the typical use case for virtio-blk is to use a backed that uses synchronous I/O Really? Does qemu open with O_SYNC? I'm definitely no block expert, but this seems strange... Rusty. Qemu can open it various ways, but the only one that is fully safe is O_SYNC (cache=writethrough). (Rusty goes away and reads the qemu man page). By default, if no explicit caching is specified for a qcow2 disk image, cache=writeback will be used. It's now switched to writethrough. In any case, cache=writeback means lie to the guest, we don't care about integrity. Well, that was the intent of the virtio barrier feature; *don't* lie to the guest, make it aware of the limitations. Of course, having read Christoph's excellent summary of the situation, it's clear I failed. Are you claiming qcow2 is unusual? I can believe snapshot is less common, though I use it all the time. You'd normally have to add a feature for something like this. I don't think this is different. Why do we need to add a feature for this? Because cache=writeback should *not* lie to the guest? Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default
On 08/27/2009 01:43 PM, Rusty Russell wrote: Are you claiming qcow2 is unusual? I can believe snapshot is less common, though I use it all the time. You'd normally have to add a feature for something like this. I don't think this is different. Why do we need to add a feature for this? Because cache=writeback should *not* lie to the guest? No, it should. There are two possible semantics to cache=writeback: - simulate a drive with a huge write cache; use fsync() to implement barriers - tell the host that we aren't interested in data integrity, lie to the guest to get best performance The first semantic is not very useful; guests don't expect huge write caches so you can't be sure of your integrity guarantees, and it's slower than cache=none due to double caching and extra copies. The second semantic is not useful for production, but is very useful for testing out things where you aren't worries about host crashes and you're usually rebooting the guest very often (you can't rely on guest caches, so you want the host to cache). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default
I just wanted this small fix for cache modes that are sane out ASAP. Maybe the picture is more clear once the we also add the support for properly flagging volatile writecaches. This is what I currently have, including experimental support in qemu that I'm going to send out soon: Index: linux-2.6/drivers/block/virtio_blk.c === --- linux-2.6.orig/drivers/block/virtio_blk.c +++ linux-2.6/drivers/block/virtio_blk.c @@ -91,15 +91,25 @@ static bool do_req(struct request_queue return false; vbr-req = req; - if (blk_fs_request(vbr-req)) { + switch (req-cmd_type) { + case REQ_TYPE_FS: vbr-out_hdr.type = 0; vbr-out_hdr.sector = blk_rq_pos(vbr-req); vbr-out_hdr.ioprio = req_get_ioprio(vbr-req); - } else if (blk_pc_request(vbr-req)) { + break; + case REQ_TYPE_BLOCK_PC: vbr-out_hdr.type = VIRTIO_BLK_T_SCSI_CMD; vbr-out_hdr.sector = 0; vbr-out_hdr.ioprio = req_get_ioprio(vbr-req); - } else { + case REQ_TYPE_LINUX_BLOCK: + if (req-cmd[0] == REQ_LB_OP_FLUSH) { + vbr-out_hdr.type = VIRTIO_BLK_T_FLUSH; + vbr-out_hdr.sector = 0; + vbr-out_hdr.ioprio = req_get_ioprio(vbr-req); + break; + } + /*FALLTHRU*/ + default: /* We don't put anything else in the queue. */ BUG(); } @@ -171,6 +181,12 @@ static void do_virtblk_request(struct re vblk-vq-vq_ops-kick(vblk-vq); } +static void virtblk_prepare_flush(struct request_queue *q, struct request *req) +{ + req-cmd_type = REQ_TYPE_LINUX_BLOCK; + req-cmd[0] = REQ_LB_OP_FLUSH; +} + /* return ATA identify data */ static int virtblk_identify(struct gendisk *disk, void *argp) @@ -336,9 +352,27 @@ static int __devinit virtblk_probe(struc vblk-disk-driverfs_dev = vdev-dev; index++; - /* If barriers are supported, tell block layer that queue is ordered */ - if (virtio_has_feature(vdev, VIRTIO_BLK_F_BARRIER)) + /* +* Set up queue ordering flags. If a host has any sort of volatile +* write cache it absolutely needs to set the WCACHE feature flag +* so that we know about it and can flush it when needed. +* +* If it is not set assume that there is no caching going on and we +* can just drain the the queue before and after the barrier. +* +* Alternatively a host can set the barrier feature flag to get +* barrier requests tag. This is not safe if write caching is +* implemented and generally no recommended to be implemented in a +* new host driver. + */ + if (virtio_has_feature(vdev, VIRTIO_BLK_F_WCACHE)) { + blk_queue_ordered(vblk-disk-queue, QUEUE_ORDERED_DRAIN_FLUSH, + virtblk_prepare_flush); + } else if (virtio_has_feature(vdev, VIRTIO_BLK_F_BARRIER)) { blk_queue_ordered(vblk-disk-queue, QUEUE_ORDERED_TAG, NULL); + } else { + blk_queue_ordered(vblk-disk-queue, QUEUE_ORDERED_DRAIN, NULL); + } /* If disk is read-only in the host, the guest should obey */ if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO)) @@ -424,7 +458,7 @@ static struct virtio_device_id id_table[ static unsigned int features[] = { VIRTIO_BLK_F_BARRIER, VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, - VIRTIO_BLK_F_SCSI, VIRTIO_BLK_F_IDENTIFY + VIRTIO_BLK_F_SCSI, VIRTIO_BLK_F_IDENTIFY, VIRTIO_BLK_F_WCACHE }; /* Index: linux-2.6/include/linux/virtio_blk.h === --- linux-2.6.orig/include/linux/virtio_blk.h +++ linux-2.6/include/linux/virtio_blk.h @@ -17,6 +17,7 @@ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ #define VIRTIO_BLK_F_SCSI 7 /* Supports scsi command passthru */ #define VIRTIO_BLK_F_IDENTIFY 8 /* ATA IDENTIFY supported */ +#define VIRTIO_BLK_F_WCACHE9 /* write cache enabled */ #define VIRTIO_BLK_ID_BYTES(sizeof(__u16[256]))/* IDENTIFY DATA */ @@ -45,6 +46,9 @@ struct virtio_blk_config { /* This bit says it's a scsi command, not an actual read or write. */ #define VIRTIO_BLK_T_SCSI_CMD 2 +/* Flush the volatile write cache */ +#define VIRTIO_BLK_T_FLUSH 4 + /* Barrier before this op. */ #define VIRTIO_BLK_T_BARRIER 0x8000 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default
On Thu, 27 Aug 2009 08:34:19 pm Avi Kivity wrote: There are two possible semantics to cache=writeback: - simulate a drive with a huge write cache; use fsync() to implement barriers - tell the host that we aren't interested in data integrity, lie to the guest to get best performance Why lie to the guest? Just say we're not ordered, and don't support barriers. Gets even *better* performance since it won't drain the queues. Maybe you're thinking of full virtualization where we guest ignorance is bliss. But lying always gets us in trouble later on when other cases come up. The second semantic is not useful for production, but is very useful for testing out things where you aren't worries about host crashes and you're usually rebooting the guest very often (you can't rely on guest caches, so you want the host to cache). This is not the ideal world; people will do things for performance in production. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default
On Tue, 25 Aug 2009 11:46:08 pm Christoph Hellwig wrote: On Tue, Aug 25, 2009 at 11:41:37PM +0930, Rusty Russell wrote: On Fri, 21 Aug 2009 06:26:16 am Christoph Hellwig wrote: Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which means it does not allow filesystems to use barriers. But the typical use case for virtio-blk is to use a backed that uses synchronous I/O Really? Does qemu open with O_SYNC? I'm definitely no block expert, but this seems strange... Rusty. Qemu can open it various ways, but the only one that is fully safe is O_SYNC (cache=writethrough). (Rusty goes away and reads the qemu man page). By default, if no explicit caching is specified for a qcow2 disk image, cache=writeback will be used. Are you claiming qcow2 is unusual? I can believe snapshot is less common, though I use it all the time. You'd normally have to add a feature for something like this. I don't think this is different. Sorry, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default
On 08/26/2009 03:06 PM, Rusty Russell wrote: On Tue, 25 Aug 2009 11:46:08 pm Christoph Hellwig wrote: On Tue, Aug 25, 2009 at 11:41:37PM +0930, Rusty Russell wrote: On Fri, 21 Aug 2009 06:26:16 am Christoph Hellwig wrote: Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which means it does not allow filesystems to use barriers. But the typical use case for virtio-blk is to use a backed that uses synchronous I/O Really? Does qemu open with O_SYNC? I'm definitely no block expert, but this seems strange... Rusty. Qemu can open it various ways, but the only one that is fully safe is O_SYNC (cache=writethrough). (Rusty goes away and reads the qemu man page). By default, if no explicit caching is specified for a qcow2 disk image, cache=writeback will be used. It's now switched to writethrough. In any case, cache=writeback means lie to the guest, we don't care about integrity. Are you claiming qcow2 is unusual? I can believe snapshot is less common, though I use it all the time. You'd normally have to add a feature for something like this. I don't think this is different. Why do we need to add a feature for this? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default
On Fri, 21 Aug 2009 06:26:16 am Christoph Hellwig wrote: Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which means it does not allow filesystems to use barriers. But the typical use case for virtio-blk is to use a backed that uses synchronous I/O Really? Does qemu open with O_SYNC? I'm definitely no block expert, but this seems strange... Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default
On Tue, Aug 25, 2009 at 11:41:37PM +0930, Rusty Russell wrote: On Fri, 21 Aug 2009 06:26:16 am Christoph Hellwig wrote: Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which means it does not allow filesystems to use barriers. But the typical use case for virtio-blk is to use a backed that uses synchronous I/O Really? Does qemu open with O_SYNC? I'm definitely no block expert, but this seems strange... Rusty. Qemu can open it various ways, but the only one that is fully safe is O_SYNC (cache=writethrough). The O_DIRECT (cache=none) option is also fully safe with the above patch under some limited circumstances (disk write caches off and using a host device or fully allocated file). Fixing the cache=writeback option and the majority case for cache=none requires implementing a cache flush command, and for the latter one also fixes to the host kernel I'm working on. You will get another patch to implement the proper cache controls in virtio-blk for me in a couple of days, too. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default
Am Donnerstag 20 August 2009 22:56:16 schrieb Christoph Hellwig: Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which means it does not allow filesystems to use barriers. But the typical use case for virtio-blk is to use a backed that uses synchronous I/O, and in that case we can simply set QUEUE_ORDERED_DRAIN to make the block layer drain the request queue around barrier I/O and provide the semantics that the filesystems need. This is what the SCSI disk driver does for disks that have the write cache disabled. With this patch we incorrectly advertise barrier support if someone configure qemu with write back caching. While this displays wrong information in the guest there is nothing that guest could have done even if we rightfully told it that we do not support any barriers. Signed-off-by: Christoph Hellwig h...@lst.de Make sense to me. Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com [...] - /* If barriers are supported, tell block layer that queue is ordered */ + /* + * If barriers are supported, tell block layer that queue is ordered. + * + * If no barriers are supported assume the host uses synchronous + * writes and just drain the the queue before and after the barrier. + */ if (virtio_has_feature(vdev, VIRTIO_BLK_F_BARRIER)) blk_queue_ordered(vblk-disk-queue, QUEUE_ORDERED_TAG, NULL); + else + blk_queue_ordered(vblk-disk-queue, QUEUE_ORDERED_DRAIN, NULL); [...] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default
Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which means it does not allow filesystems to use barriers. But the typical use case for virtio-blk is to use a backed that uses synchronous I/O, and in that case we can simply set QUEUE_ORDERED_DRAIN to make the block layer drain the request queue around barrier I/O and provide the semantics that the filesystems need. This is what the SCSI disk driver does for disks that have the write cache disabled. With this patch we incorrectly advertise barrier support if someone configure qemu with write back caching. While this displays wrong information in the guest there is nothing that guest could have done even if we rightfully told it that we do not support any barriers. Signed-off-by: Christoph Hellwig h...@lst.de Index: linux-2.6/drivers/block/virtio_blk.c === --- linux-2.6.orig/drivers/block/virtio_blk.c 2009-08-20 17:41:37.019718433 -0300 +++ linux-2.6/drivers/block/virtio_blk.c2009-08-20 17:45:40.511747922 -0300 @@ -336,9 +336,16 @@ static int __devinit virtblk_probe(struc vblk-disk-driverfs_dev = vdev-dev; index++; - /* If barriers are supported, tell block layer that queue is ordered */ + /* +* If barriers are supported, tell block layer that queue is ordered. +* +* If no barriers are supported assume the host uses synchronous +* writes and just drain the the queue before and after the barrier. +*/ if (virtio_has_feature(vdev, VIRTIO_BLK_F_BARRIER)) blk_queue_ordered(vblk-disk-queue, QUEUE_ORDERED_TAG, NULL); + else + blk_queue_ordered(vblk-disk-queue, QUEUE_ORDERED_DRAIN, NULL); /* If disk is read-only in the host, the guest should obey */ if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO)) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html