Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default

2009-09-22 Thread Rusty Russell
On Fri, 18 Sep 2009 03:01:42 am Christoph Hellwig wrote:
 Err, I'll take this one back for now pending some more discussion.
 What we need more urgently is the writeback cache flag, which is now
 implemented in qemu, patch following ASAP.

OK, still catching up on mail.  I'll push them out of the queue for now.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default

2009-09-17 Thread Christoph Hellwig
Err, I'll take this one back for now pending some more discussion.
What we need more urgently is the writeback cache flag, which is now
implemented in qemu, patch following ASAP.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default

2009-08-28 Thread Avi Kivity

On 08/28/2009 04:15 AM, Rusty Russell wrote:

On Thu, 27 Aug 2009 08:34:19 pm Avi Kivity wrote:
   

There are two possible semantics to cache=writeback:

- simulate a drive with a huge write cache; use fsync() to implement
barriers
- tell the host that we aren't interested in data integrity, lie to the
guest to get best performance
 

Why lie to the guest?  Just say we're not ordered, and don't support barriers.
Gets even *better* performance since it won't drain the queues.
   


In that case, honesty is preferable.  It means testing with 
cache=writeback exercises different guest code paths, but that's acceptable.



Maybe you're thinking of full virtualization where we guest ignorance is
bliss.  But lying always gets us in trouble later on when other cases come
up.

   

The second semantic is not useful for production, but is very useful for
testing out things where you aren't worries about host crashes and
you're usually rebooting the guest very often (you can't rely on guest
caches, so you want the host to cache).
 

This is not the ideal world; people will do things for performance in
production.

   


We found that cache=none is faster than cache=writeback when you're 
really interested in performance (no qcow2).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default

2009-08-27 Thread Rusty Russell
On Wed, 26 Aug 2009 09:58:13 pm Avi Kivity wrote:
 On 08/26/2009 03:06 PM, Rusty Russell wrote:
  On Tue, 25 Aug 2009 11:46:08 pm Christoph Hellwig wrote:
 
  On Tue, Aug 25, 2009 at 11:41:37PM +0930, Rusty Russell wrote:
   
  On Fri, 21 Aug 2009 06:26:16 am Christoph Hellwig wrote:
 
  Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, 
  which
  means it does not allow filesystems to use barriers.  But the typical use
  case for virtio-blk is to use a backed that uses synchronous I/O
   
  Really?  Does qemu open with O_SYNC?
 
  I'm definitely no block expert, but this seems strange...
  Rusty.
 
  Qemu can open it various ways, but the only one that is fully safe
  is O_SYNC (cache=writethrough).
   
  (Rusty goes away and reads the qemu man page).
 
  By default, if no explicit caching is specified for a qcow2 disk image,
  cache=writeback will be used.
 
 
 It's now switched to writethrough.  In any case, cache=writeback means 
 lie to the guest, we don't care about integrity.

Well, that was the intent of the virtio barrier feature; *don't* lie to the
guest, make it aware of the limitations.

Of course, having read Christoph's excellent summary of the situation, it's
clear I failed.

  Are you claiming qcow2 is unusual?  I can believe snapshot is less common,
  though I use it all the time.
 
  You'd normally have to add a feature for something like this.  I don't
  think this is different.
 
 Why do we need to add a feature for this?

Because cache=writeback should *not* lie to the guest?

Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default

2009-08-27 Thread Avi Kivity

On 08/27/2009 01:43 PM, Rusty Russell wrote:



Are you claiming qcow2 is unusual?  I can believe snapshot is less common,
though I use it all the time.

You'd normally have to add a feature for something like this.  I don't
think this is different.
   

Why do we need to add a feature for this?
 

Because cache=writeback should *not* lie to the guest?

   


No, it should.

There are two possible semantics to cache=writeback:

- simulate a drive with a huge write cache; use fsync() to implement 
barriers
- tell the host that we aren't interested in data integrity, lie to the 
guest to get best performance


The first semantic is not very useful; guests don't expect huge write 
caches so you can't be sure of your integrity guarantees, and it's 
slower than cache=none due to double caching and extra copies.  The 
second semantic is not useful for production, but is very useful for 
testing out things where you aren't worries about host crashes and 
you're usually rebooting the guest very often (you can't rely on guest 
caches, so you want the host to cache).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default

2009-08-27 Thread Christoph Hellwig

I just wanted this small fix for cache modes that are sane out ASAP.
Maybe the picture is more clear once the we also add the support for
properly flagging volatile writecaches.

This is what I currently have, including experimental support in qemu
that I'm going to send out soon:


Index: linux-2.6/drivers/block/virtio_blk.c
===
--- linux-2.6.orig/drivers/block/virtio_blk.c
+++ linux-2.6/drivers/block/virtio_blk.c
@@ -91,15 +91,25 @@ static bool do_req(struct request_queue 
return false;
 
vbr-req = req;
-   if (blk_fs_request(vbr-req)) {
+   switch (req-cmd_type) {
+   case REQ_TYPE_FS:
vbr-out_hdr.type = 0;
vbr-out_hdr.sector = blk_rq_pos(vbr-req);
vbr-out_hdr.ioprio = req_get_ioprio(vbr-req);
-   } else if (blk_pc_request(vbr-req)) {
+   break;
+   case REQ_TYPE_BLOCK_PC:
vbr-out_hdr.type = VIRTIO_BLK_T_SCSI_CMD;
vbr-out_hdr.sector = 0;
vbr-out_hdr.ioprio = req_get_ioprio(vbr-req);
-   } else {
+   case REQ_TYPE_LINUX_BLOCK:
+   if (req-cmd[0] == REQ_LB_OP_FLUSH) {
+   vbr-out_hdr.type = VIRTIO_BLK_T_FLUSH;
+   vbr-out_hdr.sector = 0;
+   vbr-out_hdr.ioprio = req_get_ioprio(vbr-req);
+   break;
+   }
+   /*FALLTHRU*/
+   default:
/* We don't put anything else in the queue. */
BUG();
}
@@ -171,6 +181,12 @@ static void do_virtblk_request(struct re
vblk-vq-vq_ops-kick(vblk-vq);
 }
 
+static void virtblk_prepare_flush(struct request_queue *q, struct request *req)
+{
+   req-cmd_type = REQ_TYPE_LINUX_BLOCK;
+   req-cmd[0] = REQ_LB_OP_FLUSH;
+}
+
 /* return ATA identify data
  */
 static int virtblk_identify(struct gendisk *disk, void *argp)
@@ -336,9 +352,27 @@ static int __devinit virtblk_probe(struc
vblk-disk-driverfs_dev = vdev-dev;
index++;
 
-   /* If barriers are supported, tell block layer that queue is ordered */
-   if (virtio_has_feature(vdev, VIRTIO_BLK_F_BARRIER))
+   /*
+* Set up queue ordering flags.  If a host has any sort of volatile
+* write cache it absolutely needs to set the WCACHE feature flag
+* so that we know about it and can flush it when needed.
+*
+* If it is not set assume that there is no caching going on and we
+* can just drain the the queue before and after the barrier.
+*
+* Alternatively a host can set the barrier feature flag to get
+* barrier requests tag.  This is not safe if write caching is
+* implemented and generally no recommended to be implemented in a
+* new host driver.
+ */
+   if (virtio_has_feature(vdev, VIRTIO_BLK_F_WCACHE)) {
+   blk_queue_ordered(vblk-disk-queue, QUEUE_ORDERED_DRAIN_FLUSH,
+ virtblk_prepare_flush);
+   } else if (virtio_has_feature(vdev, VIRTIO_BLK_F_BARRIER)) {
blk_queue_ordered(vblk-disk-queue, QUEUE_ORDERED_TAG, NULL);
+   } else {
+   blk_queue_ordered(vblk-disk-queue, QUEUE_ORDERED_DRAIN, NULL);
+   }
 
/* If disk is read-only in the host, the guest should obey */
if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO))
@@ -424,7 +458,7 @@ static struct virtio_device_id id_table[
 static unsigned int features[] = {
VIRTIO_BLK_F_BARRIER, VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX,
VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE,
-   VIRTIO_BLK_F_SCSI, VIRTIO_BLK_F_IDENTIFY
+   VIRTIO_BLK_F_SCSI, VIRTIO_BLK_F_IDENTIFY, VIRTIO_BLK_F_WCACHE
 };
 
 /*
Index: linux-2.6/include/linux/virtio_blk.h
===
--- linux-2.6.orig/include/linux/virtio_blk.h
+++ linux-2.6/include/linux/virtio_blk.h
@@ -17,6 +17,7 @@
 #define VIRTIO_BLK_F_BLK_SIZE  6   /* Block size of disk is available*/
 #define VIRTIO_BLK_F_SCSI  7   /* Supports scsi command passthru */
 #define VIRTIO_BLK_F_IDENTIFY  8   /* ATA IDENTIFY supported */
+#define VIRTIO_BLK_F_WCACHE9   /* write cache enabled */
 
 #define VIRTIO_BLK_ID_BYTES(sizeof(__u16[256]))/* IDENTIFY DATA */
 
@@ -45,6 +46,9 @@ struct virtio_blk_config {
 /* This bit says it's a scsi command, not an actual read or write. */
 #define VIRTIO_BLK_T_SCSI_CMD  2
 
+/* Flush the volatile write cache */
+#define VIRTIO_BLK_T_FLUSH 4
+
 /* Barrier before this op. */
 #define VIRTIO_BLK_T_BARRIER   0x8000
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default

2009-08-27 Thread Rusty Russell
On Thu, 27 Aug 2009 08:34:19 pm Avi Kivity wrote:
 There are two possible semantics to cache=writeback:
 
 - simulate a drive with a huge write cache; use fsync() to implement 
 barriers
 - tell the host that we aren't interested in data integrity, lie to the 
 guest to get best performance

Why lie to the guest?  Just say we're not ordered, and don't support barriers.
Gets even *better* performance since it won't drain the queues.

Maybe you're thinking of full virtualization where we guest ignorance is
bliss.  But lying always gets us in trouble later on when other cases come
up.

 The second semantic is not useful for production, but is very useful for 
 testing out things where you aren't worries about host crashes and 
 you're usually rebooting the guest very often (you can't rely on guest 
 caches, so you want the host to cache).

This is not the ideal world; people will do things for performance in
production.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default

2009-08-26 Thread Rusty Russell
On Tue, 25 Aug 2009 11:46:08 pm Christoph Hellwig wrote:
 On Tue, Aug 25, 2009 at 11:41:37PM +0930, Rusty Russell wrote:
  On Fri, 21 Aug 2009 06:26:16 am Christoph Hellwig wrote:
   Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which
   means it does not allow filesystems to use barriers.  But the typical use
   case for virtio-blk is to use a backed that uses synchronous I/O
  
  Really?  Does qemu open with O_SYNC?
  
  I'm definitely no block expert, but this seems strange...
  Rusty.
 
 Qemu can open it various ways, but the only one that is fully safe
 is O_SYNC (cache=writethrough).

(Rusty goes away and reads the qemu man page).

By default, if no explicit caching is specified for a qcow2 disk image,
cache=writeback will be used.

Are you claiming qcow2 is unusual?  I can believe snapshot is less common,
though I use it all the time.

You'd normally have to add a feature for something like this.  I don't
think this is different.

Sorry,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default

2009-08-26 Thread Avi Kivity

On 08/26/2009 03:06 PM, Rusty Russell wrote:

On Tue, 25 Aug 2009 11:46:08 pm Christoph Hellwig wrote:
   

On Tue, Aug 25, 2009 at 11:41:37PM +0930, Rusty Russell wrote:
 

On Fri, 21 Aug 2009 06:26:16 am Christoph Hellwig wrote:
   

Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which
means it does not allow filesystems to use barriers.  But the typical use
case for virtio-blk is to use a backed that uses synchronous I/O
 

Really?  Does qemu open with O_SYNC?

I'm definitely no block expert, but this seems strange...
Rusty.
   

Qemu can open it various ways, but the only one that is fully safe
is O_SYNC (cache=writethrough).
 

(Rusty goes away and reads the qemu man page).

By default, if no explicit caching is specified for a qcow2 disk image,
cache=writeback will be used.
   


It's now switched to writethrough.  In any case, cache=writeback means 
lie to the guest, we don't care about integrity.



Are you claiming qcow2 is unusual?  I can believe snapshot is less common,
though I use it all the time.

You'd normally have to add a feature for something like this.  I don't
think this is different.
   


Why do we need to add a feature for this?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default

2009-08-25 Thread Rusty Russell
On Fri, 21 Aug 2009 06:26:16 am Christoph Hellwig wrote:
 Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which
 means it does not allow filesystems to use barriers.  But the typical use
 case for virtio-blk is to use a backed that uses synchronous I/O

Really?  Does qemu open with O_SYNC?

I'm definitely no block expert, but this seems strange...
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default

2009-08-25 Thread Christoph Hellwig
On Tue, Aug 25, 2009 at 11:41:37PM +0930, Rusty Russell wrote:
 On Fri, 21 Aug 2009 06:26:16 am Christoph Hellwig wrote:
  Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which
  means it does not allow filesystems to use barriers.  But the typical use
  case for virtio-blk is to use a backed that uses synchronous I/O
 
 Really?  Does qemu open with O_SYNC?
 
 I'm definitely no block expert, but this seems strange...
 Rusty.

Qemu can open it various ways, but the only one that is fully safe
is O_SYNC (cache=writethrough).  The O_DIRECT (cache=none) option is also
fully safe with the above patch under some limited circumstances 
(disk write caches off and using a host device or fully allocated file).

Fixing the cache=writeback option and the majority case for cache=none
requires implementing a cache flush command, and for the latter one
also fixes to the host kernel I'm working on.  You will get another
patch to implement the proper cache controls in virtio-blk for me in
a couple of days, too.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default

2009-08-21 Thread Christian Borntraeger
Am Donnerstag 20 August 2009 22:56:16 schrieb Christoph Hellwig:
 Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which
 means it does not allow filesystems to use barriers.  But the typical use
 case for virtio-blk is to use a backed that uses synchronous I/O, and in
 that case we can simply set QUEUE_ORDERED_DRAIN to make the block layer
 drain the request queue around barrier I/O and provide the semantics that
 the filesystems need.  This is what the SCSI disk driver does for disks
 that have the write cache disabled.

 With this patch we incorrectly advertise barrier support if someone
 configure qemu with write back caching.  While this displays wrong
 information in the guest there is nothing that guest could have done
 even if we rightfully told it that we do not support any barriers.

 Signed-off-by: Christoph Hellwig h...@lst.de

Make sense to me.
Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com

[...]
 - /* If barriers are supported, tell block layer that queue is ordered */
 + /*
 +  * If barriers are supported, tell block layer that queue is ordered.
 +  *
 +  * If no barriers are supported assume the host uses synchronous
 +  * writes and just drain the the queue before and after the barrier.
 +  */
   if (virtio_has_feature(vdev, VIRTIO_BLK_F_BARRIER))
   blk_queue_ordered(vblk-disk-queue, QUEUE_ORDERED_TAG, NULL);
 + else
 + blk_queue_ordered(vblk-disk-queue, QUEUE_ORDERED_DRAIN, NULL);
[...]
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default

2009-08-20 Thread Christoph Hellwig
Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which
means it does not allow filesystems to use barriers.  But the typical use
case for virtio-blk is to use a backed that uses synchronous I/O, and in
that case we can simply set QUEUE_ORDERED_DRAIN to make the block layer
drain the request queue around barrier I/O and provide the semantics that
the filesystems need.  This is what the SCSI disk driver does for disks
that have the write cache disabled.

With this patch we incorrectly advertise barrier support if someone
configure qemu with write back caching.  While this displays wrong
information in the guest there is nothing that guest could have done
even if we rightfully told it that we do not support any barriers.


Signed-off-by: Christoph Hellwig h...@lst.de

Index: linux-2.6/drivers/block/virtio_blk.c
===
--- linux-2.6.orig/drivers/block/virtio_blk.c   2009-08-20 17:41:37.019718433 
-0300
+++ linux-2.6/drivers/block/virtio_blk.c2009-08-20 17:45:40.511747922 
-0300
@@ -336,9 +336,16 @@ static int __devinit virtblk_probe(struc
vblk-disk-driverfs_dev = vdev-dev;
index++;
 
-   /* If barriers are supported, tell block layer that queue is ordered */
+   /*
+* If barriers are supported, tell block layer that queue is ordered.
+*
+* If no barriers are supported assume the host uses synchronous
+* writes and just drain the the queue before and after the barrier.
+*/
if (virtio_has_feature(vdev, VIRTIO_BLK_F_BARRIER))
blk_queue_ordered(vblk-disk-queue, QUEUE_ORDERED_TAG, NULL);
+   else
+   blk_queue_ordered(vblk-disk-queue, QUEUE_ORDERED_DRAIN, NULL);
 
/* If disk is read-only in the host, the guest should obey */
if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO))
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html