Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-20 Thread Asias He

On 07/21/2012 04:56 AM, Anthony Liguori wrote:

"Michael S. Tsirkin"  writes:


On Thu, Jul 19, 2012 at 08:05:42AM -0500, Anthony Liguori wrote:

Of course, the million dollar question is why would using AIO in the
kernel be faster than using AIO in userspace?


Actually for me a more important question is how does it compare
with virtio-blk dataplane?


I'm not even asking for a benchmark comparision.  It's the same API
being called from a kernel thread vs. a userspace thread.  Why would
there be a 60% performance difference between the two?  That doesn't
make any sense.


Please read the commit log again. I am not saying vhost-blk v.s 
userspace implementation gives 60% improvement. I am saying the 
vhost-blk v.s original vhost-blk gives 60% improvement.



"""
This patch is based on Liu Yuan's implementation with various
improvements and bug fixes. Notably, this patch makes guest notify and
host completion processing in parallel which gives about 60% performance
improvement compared to Liu Yuan's implementation.
"""



There's got to be a better justification for putting this in the kernel
than just that we can.

I completely understand why Christoph's suggestion of submitting BIOs
directly would be faster.  There's no way to do that in userspace.


Well. With Zach and Dave's new in-kernel aio API, the aio usage in 
kernel is much simpler than in userspace. This a potential reason that 
in kernel one is better than userspace one. I am working on it right 
now. And for block based image, as suggested by Christoph, we can submit 
bio directly. This is another potential reason.


Why can't we just go further to see if we can improve the IO stack from 
guest kernel side all the way down to host kernel side. We can not do 
that if we stick to doing everything in userspace (qemu).


--
Asias


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-20 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Thu, Jul 19, 2012 at 08:05:42AM -0500, Anthony Liguori wrote:
>> Of course, the million dollar question is why would using AIO in the
>> kernel be faster than using AIO in userspace?
>
> Actually for me a more important question is how does it compare
> with virtio-blk dataplane?

I'm not even asking for a benchmark comparision.  It's the same API
being called from a kernel thread vs. a userspace thread.  Why would
there be a 60% performance difference between the two?  That doesn't
make any sense.

There's got to be a better justification for putting this in the kernel
than just that we can.

I completely understand why Christoph's suggestion of submitting BIOs
directly would be faster.  There's no way to do that in userspace.

Regards,

Anthony Liguori

>
> -- 
> MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-20 Thread Stefan Hajnoczi
On Thu, Jul 19, 2012 at 2:09 PM, Michael S. Tsirkin  wrote:
> On Thu, Jul 19, 2012 at 08:05:42AM -0500, Anthony Liguori wrote:
>> Of course, the million dollar question is why would using AIO in the
>> kernel be faster than using AIO in userspace?
>
> Actually for me a more important question is how does it compare
> with virtio-blk dataplane?

Hi Khoa,
I think you have results of data-plane and vhost-blk?  Is the
vhost-blk version identical to Asias' recent patches?

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-20 Thread Stefan Hajnoczi
On Thu, Jul 19, 2012 at 2:09 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Jul 19, 2012 at 08:05:42AM -0500, Anthony Liguori wrote:
 Of course, the million dollar question is why would using AIO in the
 kernel be faster than using AIO in userspace?

 Actually for me a more important question is how does it compare
 with virtio-blk dataplane?

Hi Khoa,
I think you have results of data-plane and vhost-blk?  Is the
vhost-blk version identical to Asias' recent patches?

Stefan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-20 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Thu, Jul 19, 2012 at 08:05:42AM -0500, Anthony Liguori wrote:
 Of course, the million dollar question is why would using AIO in the
 kernel be faster than using AIO in userspace?

 Actually for me a more important question is how does it compare
 with virtio-blk dataplane?

I'm not even asking for a benchmark comparision.  It's the same API
being called from a kernel thread vs. a userspace thread.  Why would
there be a 60% performance difference between the two?  That doesn't
make any sense.

There's got to be a better justification for putting this in the kernel
than just that we can.

I completely understand why Christoph's suggestion of submitting BIOs
directly would be faster.  There's no way to do that in userspace.

Regards,

Anthony Liguori


 -- 
 MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-20 Thread Asias He

On 07/21/2012 04:56 AM, Anthony Liguori wrote:

Michael S. Tsirkin m...@redhat.com writes:


On Thu, Jul 19, 2012 at 08:05:42AM -0500, Anthony Liguori wrote:

Of course, the million dollar question is why would using AIO in the
kernel be faster than using AIO in userspace?


Actually for me a more important question is how does it compare
with virtio-blk dataplane?


I'm not even asking for a benchmark comparision.  It's the same API
being called from a kernel thread vs. a userspace thread.  Why would
there be a 60% performance difference between the two?  That doesn't
make any sense.


Please read the commit log again. I am not saying vhost-blk v.s 
userspace implementation gives 60% improvement. I am saying the 
vhost-blk v.s original vhost-blk gives 60% improvement.




This patch is based on Liu Yuan's implementation with various
improvements and bug fixes. Notably, this patch makes guest notify and
host completion processing in parallel which gives about 60% performance
improvement compared to Liu Yuan's implementation.




There's got to be a better justification for putting this in the kernel
than just that we can.

I completely understand why Christoph's suggestion of submitting BIOs
directly would be faster.  There's no way to do that in userspace.


Well. With Zach and Dave's new in-kernel aio API, the aio usage in 
kernel is much simpler than in userspace. This a potential reason that 
in kernel one is better than userspace one. I am working on it right 
now. And for block based image, as suggested by Christoph, we can submit 
bio directly. This is another potential reason.


Why can't we just go further to see if we can improve the IO stack from 
guest kernel side all the way down to host kernel side. We can not do 
that if we stick to doing everything in userspace (qemu).


--
Asias


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-19 Thread Michael S. Tsirkin
On Thu, Jul 19, 2012 at 08:05:42AM -0500, Anthony Liguori wrote:
> Of course, the million dollar question is why would using AIO in the
> kernel be faster than using AIO in userspace?

Actually for me a more important question is how does it compare
with virtio-blk dataplane?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-19 Thread Michael S. Tsirkin
On Thu, Jul 19, 2012 at 08:05:42AM -0500, Anthony Liguori wrote:
> Asias He  writes:
> 
> > vhost-blk is a in kernel virito-blk device accelerator.
> >
> > This patch is based on Liu Yuan's implementation with various
> > improvements and bug fixes. Notably, this patch makes guest notify and
> > host completion processing in parallel which gives about 60% performance
> > improvement compared to Liu Yuan's implementation.
> >
> > Performance evaluation:
> > -
> > The comparison is between kvm tool with usersapce implementation and kvm
> > tool with vhost-blk.
> >
> > 1) Fio with libaio ioengine on Fusion IO device
> > With bio-based IO path, sequential read/write, random read/write
> > IOPS boost : 8.4%, 15.3%, 10.4%, 14.6%
> > Latency improvement: 8.5%, 15.4%, 10.4%, 15.1%
> >
> > 2) Fio with vsync ioengine on Fusion IO device
> > With bio-based IO path, sequential read/write, random read/write
> > IOPS boost : 10.5%, 4.8%, 5.2%, 5.6%
> > Latency improvement: 11.4%, 5.0%, 5.2%, 5.8%
> >
> > Cc: Michael S. Tsirkin 
> > Cc: linux-kernel@vger.kernel.org
> > Cc: k...@vger.kernel.org
> > Cc: virtualizat...@lists.linux-foundation.org
> > Signed-off-by: Asias He 
> > ---
> >  drivers/vhost/Kconfig  |   10 +
> >  drivers/vhost/Makefile |2 +
> >  drivers/vhost/blk.c|  600 
> > 
> >  drivers/vhost/vhost.h  |5 +
> >  include/linux/vhost.h  |3 +
> >  5 files changed, 620 insertions(+)
> >  create mode 100644 drivers/vhost/blk.c
> >
> > diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> > index c387067..fa071a8 100644
> > --- a/drivers/vhost/Kconfig
> > +++ b/drivers/vhost/Kconfig
> > @@ -16,4 +16,14 @@ config VHOST_NET
> >  
> >   To compile this driver as a module, choose M here: the module will
> >   be called vhost_net.
> > +config VHOST_BLK
> > +   tristate "Host kernel accelerator for virtio blk (EXPERIMENTAL)"
> > +   depends on VHOST && BLOCK && AIO && EVENTFD && EXPERIMENTAL
> > +   ---help---
> > + This kernel module can be loaded in host kernel to accelerate
> > + guest block with virtio_blk. Not to be confused with virtio_blk
> > + module itself which needs to be loaded in guest kernel.
> > +
> > + To compile this driver as a module, choose M here: the module will
> > + be called vhost_blk.
> >  
> > diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
> > index cd36885..aa461d5 100644
> > --- a/drivers/vhost/Makefile
> > +++ b/drivers/vhost/Makefile
> > @@ -1,4 +1,6 @@
> >  obj-$(CONFIG_VHOST)+= vhost.o
> >  obj-$(CONFIG_VHOST_NET) += vhost_net.o
> > +obj-$(CONFIG_VHOST_BLK) += vhost_blk.o
> >  
> >  vhost_net-y:= net.o
> > +vhost_blk-y:= blk.o
> > diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c
> > new file mode 100644
> > index 000..6a94894
> > --- /dev/null
> > +++ b/drivers/vhost/blk.c
> > @@ -0,0 +1,600 @@
> > +/*
> > + * Copyright (C) 2011 Taobao, Inc.
> > + * Author: Liu Yuan 
> > + *
> > + * Copyright (C) 2012 Red Hat, Inc.
> > + * Author: Asias He 
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.
> > + *
> > + * virtio-blk server in host kernel.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "vhost.h"
> > +
> > +#define BLK_HDR0
> > +
> > +enum {
> > +   VHOST_BLK_VQ_REQ = 0,
> > +   VHOST_BLK_VQ_MAX = 1,
> > +};
> > +
> > +struct vhost_blk_req {
> > +   u16 head;
> > +   u8 *status;
> > +};
> > +
> > +struct vhost_blk {
> > +   struct task_struct *worker_host_kick;
> > +   struct task_struct *worker;
> > +   struct vhost_blk_req *reqs;
> > +   struct vhost_virtqueue vq;
> > +   struct eventfd_ctx *ectx;
> > +   struct io_event *ioevent;
> > +   struct kioctx *ioctx;
> > +   struct vhost_dev dev;
> > +   struct file *efile;
> > +   u64 ioevent_nr;
> > +   bool stop;
> > +};
> > +
> > +static inline int vhost_blk_read_events(struct vhost_blk *blk, long nr)
> > +{
> > +   mm_segment_t old_fs = get_fs();
> > +   int ret;
> > +
> > +   set_fs(KERNEL_DS);
> > +   ret = read_events(blk->ioctx, nr, nr, blk->ioevent, NULL);
> > +   set_fs(old_fs);
> > +
> > +   return ret;
> > +}
> > +
> > +static int vhost_blk_setup(struct vhost_blk *blk)
> > +{
> > +   struct kioctx *ctx;
> > +
> > +   if (blk->ioctx)
> > +   return 0;
> > +
> > +   blk->ioevent_nr = blk->vq.num;
> > +   ctx = ioctx_alloc(blk->ioevent_nr);
> > +   if (IS_ERR(ctx)) {
> > +   pr_err("Failed to ioctx_alloc");
> > +   return PTR_ERR(ctx);
> > +   }
> 
> Not that it's very likely that ioctx_alloc will fail in practice.
> There's a fixed number of events that can be allocated that's currently
> 0x1.  If you have a ring queue size of 1024 (which is normal) then
> that limits you to 64 vhost-blk devices.
> 
> Realistically, I 

Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-19 Thread Anthony Liguori
Asias He  writes:

> vhost-blk is a in kernel virito-blk device accelerator.
>
> This patch is based on Liu Yuan's implementation with various
> improvements and bug fixes. Notably, this patch makes guest notify and
> host completion processing in parallel which gives about 60% performance
> improvement compared to Liu Yuan's implementation.
>
> Performance evaluation:
> -
> The comparison is between kvm tool with usersapce implementation and kvm
> tool with vhost-blk.
>
> 1) Fio with libaio ioengine on Fusion IO device
> With bio-based IO path, sequential read/write, random read/write
> IOPS boost : 8.4%, 15.3%, 10.4%, 14.6%
> Latency improvement: 8.5%, 15.4%, 10.4%, 15.1%
>
> 2) Fio with vsync ioengine on Fusion IO device
> With bio-based IO path, sequential read/write, random read/write
> IOPS boost : 10.5%, 4.8%, 5.2%, 5.6%
> Latency improvement: 11.4%, 5.0%, 5.2%, 5.8%
>
> Cc: Michael S. Tsirkin 
> Cc: linux-kernel@vger.kernel.org
> Cc: k...@vger.kernel.org
> Cc: virtualizat...@lists.linux-foundation.org
> Signed-off-by: Asias He 
> ---
>  drivers/vhost/Kconfig  |   10 +
>  drivers/vhost/Makefile |2 +
>  drivers/vhost/blk.c|  600 
> 
>  drivers/vhost/vhost.h  |5 +
>  include/linux/vhost.h  |3 +
>  5 files changed, 620 insertions(+)
>  create mode 100644 drivers/vhost/blk.c
>
> diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> index c387067..fa071a8 100644
> --- a/drivers/vhost/Kconfig
> +++ b/drivers/vhost/Kconfig
> @@ -16,4 +16,14 @@ config VHOST_NET
>  
> To compile this driver as a module, choose M here: the module will
> be called vhost_net.
> +config VHOST_BLK
> + tristate "Host kernel accelerator for virtio blk (EXPERIMENTAL)"
> + depends on VHOST && BLOCK && AIO && EVENTFD && EXPERIMENTAL
> + ---help---
> +   This kernel module can be loaded in host kernel to accelerate
> +   guest block with virtio_blk. Not to be confused with virtio_blk
> +   module itself which needs to be loaded in guest kernel.
> +
> +   To compile this driver as a module, choose M here: the module will
> +   be called vhost_blk.
>  
> diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
> index cd36885..aa461d5 100644
> --- a/drivers/vhost/Makefile
> +++ b/drivers/vhost/Makefile
> @@ -1,4 +1,6 @@
>  obj-$(CONFIG_VHOST)  += vhost.o
>  obj-$(CONFIG_VHOST_NET) += vhost_net.o
> +obj-$(CONFIG_VHOST_BLK) += vhost_blk.o
>  
>  vhost_net-y  := net.o
> +vhost_blk-y  := blk.o
> diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c
> new file mode 100644
> index 000..6a94894
> --- /dev/null
> +++ b/drivers/vhost/blk.c
> @@ -0,0 +1,600 @@
> +/*
> + * Copyright (C) 2011 Taobao, Inc.
> + * Author: Liu Yuan 
> + *
> + * Copyright (C) 2012 Red Hat, Inc.
> + * Author: Asias He 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + *
> + * virtio-blk server in host kernel.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "vhost.h"
> +
> +#define BLK_HDR  0
> +
> +enum {
> + VHOST_BLK_VQ_REQ = 0,
> + VHOST_BLK_VQ_MAX = 1,
> +};
> +
> +struct vhost_blk_req {
> + u16 head;
> + u8 *status;
> +};
> +
> +struct vhost_blk {
> + struct task_struct *worker_host_kick;
> + struct task_struct *worker;
> + struct vhost_blk_req *reqs;
> + struct vhost_virtqueue vq;
> + struct eventfd_ctx *ectx;
> + struct io_event *ioevent;
> + struct kioctx *ioctx;
> + struct vhost_dev dev;
> + struct file *efile;
> + u64 ioevent_nr;
> + bool stop;
> +};
> +
> +static inline int vhost_blk_read_events(struct vhost_blk *blk, long nr)
> +{
> + mm_segment_t old_fs = get_fs();
> + int ret;
> +
> + set_fs(KERNEL_DS);
> + ret = read_events(blk->ioctx, nr, nr, blk->ioevent, NULL);
> + set_fs(old_fs);
> +
> + return ret;
> +}
> +
> +static int vhost_blk_setup(struct vhost_blk *blk)
> +{
> + struct kioctx *ctx;
> +
> + if (blk->ioctx)
> + return 0;
> +
> + blk->ioevent_nr = blk->vq.num;
> + ctx = ioctx_alloc(blk->ioevent_nr);
> + if (IS_ERR(ctx)) {
> + pr_err("Failed to ioctx_alloc");
> + return PTR_ERR(ctx);
> + }

Not that it's very likely that ioctx_alloc will fail in practice.
There's a fixed number of events that can be allocated that's currently
0x1.  If you have a ring queue size of 1024 (which is normal) then
that limits you to 64 vhost-blk devices.

Realistically, I don't think you can only do aio with vhost-blk because
of this (and many other) limitations.  It's necessary to be able to fall
back to a thread pool because AIO cannot be relied upon.

> + put_ioctx(ctx);
> + blk->ioctx = ctx;
> +
> + blk->ioevent = kmalloc(sizeof(struct io_event) * blk->ioevent_nr,
> +  

Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-19 Thread Anthony Liguori
Asias He as...@redhat.com writes:

 vhost-blk is a in kernel virito-blk device accelerator.

 This patch is based on Liu Yuan's implementation with various
 improvements and bug fixes. Notably, this patch makes guest notify and
 host completion processing in parallel which gives about 60% performance
 improvement compared to Liu Yuan's implementation.

 Performance evaluation:
 -
 The comparison is between kvm tool with usersapce implementation and kvm
 tool with vhost-blk.

 1) Fio with libaio ioengine on Fusion IO device
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost : 8.4%, 15.3%, 10.4%, 14.6%
 Latency improvement: 8.5%, 15.4%, 10.4%, 15.1%

 2) Fio with vsync ioengine on Fusion IO device
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost : 10.5%, 4.8%, 5.2%, 5.6%
 Latency improvement: 11.4%, 5.0%, 5.2%, 5.8%

 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: linux-kernel@vger.kernel.org
 Cc: k...@vger.kernel.org
 Cc: virtualizat...@lists.linux-foundation.org
 Signed-off-by: Asias He as...@redhat.com
 ---
  drivers/vhost/Kconfig  |   10 +
  drivers/vhost/Makefile |2 +
  drivers/vhost/blk.c|  600 
 
  drivers/vhost/vhost.h  |5 +
  include/linux/vhost.h  |3 +
  5 files changed, 620 insertions(+)
  create mode 100644 drivers/vhost/blk.c

 diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
 index c387067..fa071a8 100644
 --- a/drivers/vhost/Kconfig
 +++ b/drivers/vhost/Kconfig
 @@ -16,4 +16,14 @@ config VHOST_NET
  
 To compile this driver as a module, choose M here: the module will
 be called vhost_net.
 +config VHOST_BLK
 + tristate Host kernel accelerator for virtio blk (EXPERIMENTAL)
 + depends on VHOST  BLOCK  AIO  EVENTFD  EXPERIMENTAL
 + ---help---
 +   This kernel module can be loaded in host kernel to accelerate
 +   guest block with virtio_blk. Not to be confused with virtio_blk
 +   module itself which needs to be loaded in guest kernel.
 +
 +   To compile this driver as a module, choose M here: the module will
 +   be called vhost_blk.
  
 diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
 index cd36885..aa461d5 100644
 --- a/drivers/vhost/Makefile
 +++ b/drivers/vhost/Makefile
 @@ -1,4 +1,6 @@
  obj-$(CONFIG_VHOST)  += vhost.o
  obj-$(CONFIG_VHOST_NET) += vhost_net.o
 +obj-$(CONFIG_VHOST_BLK) += vhost_blk.o
  
  vhost_net-y  := net.o
 +vhost_blk-y  := blk.o
 diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c
 new file mode 100644
 index 000..6a94894
 --- /dev/null
 +++ b/drivers/vhost/blk.c
 @@ -0,0 +1,600 @@
 +/*
 + * Copyright (C) 2011 Taobao, Inc.
 + * Author: Liu Yuan tailai...@taobao.com
 + *
 + * Copyright (C) 2012 Red Hat, Inc.
 + * Author: Asias He as...@redhat.com
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2.
 + *
 + * virtio-blk server in host kernel.
 + */
 +
 +#include linux/miscdevice.h
 +#include linux/module.h
 +#include linux/vhost.h
 +#include linux/virtio_blk.h
 +#include linux/eventfd.h
 +#include linux/mutex.h
 +#include linux/file.h
 +#include linux/mmu_context.h
 +#include linux/anon_inodes.h
 +#include linux/kthread.h
 +#include linux/blkdev.h
 +
 +#include vhost.h
 +
 +#define BLK_HDR  0
 +
 +enum {
 + VHOST_BLK_VQ_REQ = 0,
 + VHOST_BLK_VQ_MAX = 1,
 +};
 +
 +struct vhost_blk_req {
 + u16 head;
 + u8 *status;
 +};
 +
 +struct vhost_blk {
 + struct task_struct *worker_host_kick;
 + struct task_struct *worker;
 + struct vhost_blk_req *reqs;
 + struct vhost_virtqueue vq;
 + struct eventfd_ctx *ectx;
 + struct io_event *ioevent;
 + struct kioctx *ioctx;
 + struct vhost_dev dev;
 + struct file *efile;
 + u64 ioevent_nr;
 + bool stop;
 +};
 +
 +static inline int vhost_blk_read_events(struct vhost_blk *blk, long nr)
 +{
 + mm_segment_t old_fs = get_fs();
 + int ret;
 +
 + set_fs(KERNEL_DS);
 + ret = read_events(blk-ioctx, nr, nr, blk-ioevent, NULL);
 + set_fs(old_fs);
 +
 + return ret;
 +}
 +
 +static int vhost_blk_setup(struct vhost_blk *blk)
 +{
 + struct kioctx *ctx;
 +
 + if (blk-ioctx)
 + return 0;
 +
 + blk-ioevent_nr = blk-vq.num;
 + ctx = ioctx_alloc(blk-ioevent_nr);
 + if (IS_ERR(ctx)) {
 + pr_err(Failed to ioctx_alloc);
 + return PTR_ERR(ctx);
 + }

Not that it's very likely that ioctx_alloc will fail in practice.
There's a fixed number of events that can be allocated that's currently
0x1.  If you have a ring queue size of 1024 (which is normal) then
that limits you to 64 vhost-blk devices.

Realistically, I don't think you can only do aio with vhost-blk because
of this (and many other) limitations.  It's necessary to be able to fall
back to a thread pool because AIO cannot be relied upon.

 + put_ioctx(ctx);
 + blk-ioctx = ctx;
 +
 +  

Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-19 Thread Michael S. Tsirkin
On Thu, Jul 19, 2012 at 08:05:42AM -0500, Anthony Liguori wrote:
 Asias He as...@redhat.com writes:
 
  vhost-blk is a in kernel virito-blk device accelerator.
 
  This patch is based on Liu Yuan's implementation with various
  improvements and bug fixes. Notably, this patch makes guest notify and
  host completion processing in parallel which gives about 60% performance
  improvement compared to Liu Yuan's implementation.
 
  Performance evaluation:
  -
  The comparison is between kvm tool with usersapce implementation and kvm
  tool with vhost-blk.
 
  1) Fio with libaio ioengine on Fusion IO device
  With bio-based IO path, sequential read/write, random read/write
  IOPS boost : 8.4%, 15.3%, 10.4%, 14.6%
  Latency improvement: 8.5%, 15.4%, 10.4%, 15.1%
 
  2) Fio with vsync ioengine on Fusion IO device
  With bio-based IO path, sequential read/write, random read/write
  IOPS boost : 10.5%, 4.8%, 5.2%, 5.6%
  Latency improvement: 11.4%, 5.0%, 5.2%, 5.8%
 
  Cc: Michael S. Tsirkin m...@redhat.com
  Cc: linux-kernel@vger.kernel.org
  Cc: k...@vger.kernel.org
  Cc: virtualizat...@lists.linux-foundation.org
  Signed-off-by: Asias He as...@redhat.com
  ---
   drivers/vhost/Kconfig  |   10 +
   drivers/vhost/Makefile |2 +
   drivers/vhost/blk.c|  600 
  
   drivers/vhost/vhost.h  |5 +
   include/linux/vhost.h  |3 +
   5 files changed, 620 insertions(+)
   create mode 100644 drivers/vhost/blk.c
 
  diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
  index c387067..fa071a8 100644
  --- a/drivers/vhost/Kconfig
  +++ b/drivers/vhost/Kconfig
  @@ -16,4 +16,14 @@ config VHOST_NET
   
To compile this driver as a module, choose M here: the module will
be called vhost_net.
  +config VHOST_BLK
  +   tristate Host kernel accelerator for virtio blk (EXPERIMENTAL)
  +   depends on VHOST  BLOCK  AIO  EVENTFD  EXPERIMENTAL
  +   ---help---
  + This kernel module can be loaded in host kernel to accelerate
  + guest block with virtio_blk. Not to be confused with virtio_blk
  + module itself which needs to be loaded in guest kernel.
  +
  + To compile this driver as a module, choose M here: the module will
  + be called vhost_blk.
   
  diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
  index cd36885..aa461d5 100644
  --- a/drivers/vhost/Makefile
  +++ b/drivers/vhost/Makefile
  @@ -1,4 +1,6 @@
   obj-$(CONFIG_VHOST)+= vhost.o
   obj-$(CONFIG_VHOST_NET) += vhost_net.o
  +obj-$(CONFIG_VHOST_BLK) += vhost_blk.o
   
   vhost_net-y:= net.o
  +vhost_blk-y:= blk.o
  diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c
  new file mode 100644
  index 000..6a94894
  --- /dev/null
  +++ b/drivers/vhost/blk.c
  @@ -0,0 +1,600 @@
  +/*
  + * Copyright (C) 2011 Taobao, Inc.
  + * Author: Liu Yuan tailai...@taobao.com
  + *
  + * Copyright (C) 2012 Red Hat, Inc.
  + * Author: Asias He as...@redhat.com
  + *
  + * This work is licensed under the terms of the GNU GPL, version 2.
  + *
  + * virtio-blk server in host kernel.
  + */
  +
  +#include linux/miscdevice.h
  +#include linux/module.h
  +#include linux/vhost.h
  +#include linux/virtio_blk.h
  +#include linux/eventfd.h
  +#include linux/mutex.h
  +#include linux/file.h
  +#include linux/mmu_context.h
  +#include linux/anon_inodes.h
  +#include linux/kthread.h
  +#include linux/blkdev.h
  +
  +#include vhost.h
  +
  +#define BLK_HDR0
  +
  +enum {
  +   VHOST_BLK_VQ_REQ = 0,
  +   VHOST_BLK_VQ_MAX = 1,
  +};
  +
  +struct vhost_blk_req {
  +   u16 head;
  +   u8 *status;
  +};
  +
  +struct vhost_blk {
  +   struct task_struct *worker_host_kick;
  +   struct task_struct *worker;
  +   struct vhost_blk_req *reqs;
  +   struct vhost_virtqueue vq;
  +   struct eventfd_ctx *ectx;
  +   struct io_event *ioevent;
  +   struct kioctx *ioctx;
  +   struct vhost_dev dev;
  +   struct file *efile;
  +   u64 ioevent_nr;
  +   bool stop;
  +};
  +
  +static inline int vhost_blk_read_events(struct vhost_blk *blk, long nr)
  +{
  +   mm_segment_t old_fs = get_fs();
  +   int ret;
  +
  +   set_fs(KERNEL_DS);
  +   ret = read_events(blk-ioctx, nr, nr, blk-ioevent, NULL);
  +   set_fs(old_fs);
  +
  +   return ret;
  +}
  +
  +static int vhost_blk_setup(struct vhost_blk *blk)
  +{
  +   struct kioctx *ctx;
  +
  +   if (blk-ioctx)
  +   return 0;
  +
  +   blk-ioevent_nr = blk-vq.num;
  +   ctx = ioctx_alloc(blk-ioevent_nr);
  +   if (IS_ERR(ctx)) {
  +   pr_err(Failed to ioctx_alloc);
  +   return PTR_ERR(ctx);
  +   }
 
 Not that it's very likely that ioctx_alloc will fail in practice.
 There's a fixed number of events that can be allocated that's currently
 0x1.  If you have a ring queue size of 1024 (which is normal) then
 that limits you to 64 vhost-blk devices.
 
 Realistically, I don't think you can only do aio with vhost-blk because
 of this (and many 

Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-19 Thread Michael S. Tsirkin
On Thu, Jul 19, 2012 at 08:05:42AM -0500, Anthony Liguori wrote:
 Of course, the million dollar question is why would using AIO in the
 kernel be faster than using AIO in userspace?

Actually for me a more important question is how does it compare
with virtio-blk dataplane?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-18 Thread Asias He

On 07/18/2012 10:31 PM, Jeff Moyer wrote:

Asias He  writes:


On 07/18/2012 03:10 AM, Jeff Moyer wrote:

Asias He  writes:


vhost-blk is a in kernel virito-blk device accelerator.

This patch is based on Liu Yuan's implementation with various
improvements and bug fixes. Notably, this patch makes guest notify and
host completion processing in parallel which gives about 60% performance
improvement compared to Liu Yuan's implementation.


So, first off, some basic questions.  Is it correct to assume that you
tested this with buffered I/O (files opened *without* O_DIRECT)?
  I'm pretty sure that if you used O_DIRECT, you'd run into problems (which
are solved by the patch set posted by Shaggy, based on Zach Brown's work
of many moons ago).  Note that, with buffered I/O, the submission path
is NOT asynchronous.  So, any speedups you've reported are extremely
suspect.  ;-)


I always used O_DIRECT to test this patchset. And I mostly used raw
block device as guest image. Is this the reason why I did not hit the
problem you mentioned. Btw, I do have run this patchset on image based
file. I still do not see problems like IO hangs.


Hmm, so do the iovec's passed in point to buffers in userspace?  I
thought they were kernel buffers, which would have blown up in
get_user_pages_fast.


Yes. The iovec's passed in point to userspace buffers. ;-)

--
Asias


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-18 Thread Jeff Moyer
Asias He  writes:

> On 07/18/2012 03:10 AM, Jeff Moyer wrote:
>> Asias He  writes:
>>
>>> vhost-blk is a in kernel virito-blk device accelerator.
>>>
>>> This patch is based on Liu Yuan's implementation with various
>>> improvements and bug fixes. Notably, this patch makes guest notify and
>>> host completion processing in parallel which gives about 60% performance
>>> improvement compared to Liu Yuan's implementation.
>>
>> So, first off, some basic questions.  Is it correct to assume that you
>> tested this with buffered I/O (files opened *without* O_DIRECT)?
>>  I'm pretty sure that if you used O_DIRECT, you'd run into problems (which
>> are solved by the patch set posted by Shaggy, based on Zach Brown's work
>> of many moons ago).  Note that, with buffered I/O, the submission path
>> is NOT asynchronous.  So, any speedups you've reported are extremely
>> suspect.  ;-)
>
> I always used O_DIRECT to test this patchset. And I mostly used raw
> block device as guest image. Is this the reason why I did not hit the
> problem you mentioned. Btw, I do have run this patchset on image based
> file. I still do not see problems like IO hangs.

Hmm, so do the iovec's passed in point to buffers in userspace?  I
thought they were kernel buffers, which would have blown up in
get_user_pages_fast.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-18 Thread Jeff Moyer
Asias He as...@redhat.com writes:

 On 07/18/2012 03:10 AM, Jeff Moyer wrote:
 Asias He as...@redhat.com writes:

 vhost-blk is a in kernel virito-blk device accelerator.

 This patch is based on Liu Yuan's implementation with various
 improvements and bug fixes. Notably, this patch makes guest notify and
 host completion processing in parallel which gives about 60% performance
 improvement compared to Liu Yuan's implementation.

 So, first off, some basic questions.  Is it correct to assume that you
 tested this with buffered I/O (files opened *without* O_DIRECT)?
  I'm pretty sure that if you used O_DIRECT, you'd run into problems (which
 are solved by the patch set posted by Shaggy, based on Zach Brown's work
 of many moons ago).  Note that, with buffered I/O, the submission path
 is NOT asynchronous.  So, any speedups you've reported are extremely
 suspect.  ;-)

 I always used O_DIRECT to test this patchset. And I mostly used raw
 block device as guest image. Is this the reason why I did not hit the
 problem you mentioned. Btw, I do have run this patchset on image based
 file. I still do not see problems like IO hangs.

Hmm, so do the iovec's passed in point to buffers in userspace?  I
thought they were kernel buffers, which would have blown up in
get_user_pages_fast.

Cheers,
Jeff
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-18 Thread Asias He

On 07/18/2012 10:31 PM, Jeff Moyer wrote:

Asias He as...@redhat.com writes:


On 07/18/2012 03:10 AM, Jeff Moyer wrote:

Asias He as...@redhat.com writes:


vhost-blk is a in kernel virito-blk device accelerator.

This patch is based on Liu Yuan's implementation with various
improvements and bug fixes. Notably, this patch makes guest notify and
host completion processing in parallel which gives about 60% performance
improvement compared to Liu Yuan's implementation.


So, first off, some basic questions.  Is it correct to assume that you
tested this with buffered I/O (files opened *without* O_DIRECT)?
  I'm pretty sure that if you used O_DIRECT, you'd run into problems (which
are solved by the patch set posted by Shaggy, based on Zach Brown's work
of many moons ago).  Note that, with buffered I/O, the submission path
is NOT asynchronous.  So, any speedups you've reported are extremely
suspect.  ;-)


I always used O_DIRECT to test this patchset. And I mostly used raw
block device as guest image. Is this the reason why I did not hit the
problem you mentioned. Btw, I do have run this patchset on image based
file. I still do not see problems like IO hangs.


Hmm, so do the iovec's passed in point to buffers in userspace?  I
thought they were kernel buffers, which would have blown up in
get_user_pages_fast.


Yes. The iovec's passed in point to userspace buffers. ;-)

--
Asias


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-17 Thread Asias He

On 07/18/2012 03:10 AM, Jeff Moyer wrote:

Asias He  writes:


vhost-blk is a in kernel virito-blk device accelerator.

This patch is based on Liu Yuan's implementation with various
improvements and bug fixes. Notably, this patch makes guest notify and
host completion processing in parallel which gives about 60% performance
improvement compared to Liu Yuan's implementation.


So, first off, some basic questions.  Is it correct to assume that you
tested this with buffered I/O (files opened *without* O_DIRECT)?
 I'm pretty sure that if you used O_DIRECT, you'd run into problems (which
are solved by the patch set posted by Shaggy, based on Zach Brown's work
of many moons ago).  Note that, with buffered I/O, the submission path
is NOT asynchronous.  So, any speedups you've reported are extremely
suspect.  ;-)


I always used O_DIRECT to test this patchset. And I mostly used raw 
block device as guest image. Is this the reason why I did not hit the 
problem you mentioned. Btw, I do have run this patchset on image based 
file. I still do not see problems like IO hangs.



Next, did you look at Shaggy's patch set?  I think it would be best to
focus your efforts on testing *that*, and implementing your work on top
of it.


I guess you mean this one:

http://marc.info/?l=linux-fsdevel=133312234313122

I did not notice that until James pointed that out.

I talked with Zach and Shaggy. Shaggy said he is still working on that 
patch set and will send that patch out soon.




Having said that, I did do some review of this patch, inlined below.


Thanks, Jeff!


+static int vhost_blk_setup(struct vhost_blk *blk)
+{
+   struct kioctx *ctx;
+
+   if (blk->ioctx)
+   return 0;
+
+   blk->ioevent_nr = blk->vq.num;
+   ctx = ioctx_alloc(blk->ioevent_nr);
+   if (IS_ERR(ctx)) {
+   pr_err("Failed to ioctx_alloc");
+   return PTR_ERR(ctx);
+   }
+   put_ioctx(ctx);
+   blk->ioctx = ctx;
+
+   blk->ioevent = kmalloc(sizeof(struct io_event) * blk->ioevent_nr,
+  GFP_KERNEL);
+   if (!blk->ioevent) {
+   pr_err("Failed to allocate memory for io_events");
+   return -ENOMEM;


You've just leaked blk->ioctx.


Yes. Will fix.


+   }
+
+   blk->reqs = kmalloc(sizeof(struct vhost_blk_req) * blk->ioevent_nr,
+   GFP_KERNEL);
+   if (!blk->reqs) {
+   pr_err("Failed to allocate memory for vhost_blk_req");
+   return -ENOMEM;


And here.


Yes. Will fix.




+   }
+
+   return 0;
+}
+

[snip]

+static int vhost_blk_io_submit(struct vhost_blk *blk, struct file *file,
+  struct vhost_blk_req *req,
+  struct iovec *iov, u64 nr_vecs, loff_t offset,
+  int opcode)
+{
+   struct kioctx *ioctx = blk->ioctx;
+   mm_segment_t oldfs = get_fs();
+   struct kiocb_batch batch;
+   struct blk_plug plug;
+   struct kiocb *iocb;
+   int ret;
+
+   if (!try_get_ioctx(ioctx)) {
+   pr_info("Failed to get ioctx");
+   return -EAGAIN;
+   }


Using try_get_ioctx directly gives me a slightly uneasy feeling.  I
understand that you don't need to do the lookup, but at least wrap it
and check for ->dead.


OK.




+
+   atomic_long_inc_not_zero(>f_count);
+   eventfd_ctx_get(blk->ectx);
+
+   /* TODO: batch to 1 is not good! */


Agreed.  You should setup the batching in vhost_blk_handle_guest_kick.
The way you've written the code, the batching is not at all helpful.


Yes. that's why there is a TODO.


+   kiocb_batch_init(, 1);
+   blk_start_plug();
+
+   iocb = aio_get_req(ioctx, );
+   if (unlikely(!iocb)) {
+   ret = -EAGAIN;
+   goto out;
+   }
+
+   iocb->ki_filp= file;
+   iocb->ki_pos = offset;
+   iocb->ki_buf = (void *)iov;
+   iocb->ki_left= nr_vecs;
+   iocb->ki_nbytes  = nr_vecs;
+   iocb->ki_opcode  = opcode;
+   iocb->ki_obj.user = req;
+   iocb->ki_eventfd  = blk->ectx;
+
+   set_fs(KERNEL_DS);
+   ret = aio_setup_iocb(iocb, false);
+   set_fs(oldfs);
+   if (unlikely(ret))
+   goto out_put_iocb;
+
+   spin_lock_irq(>ctx_lock);
+   if (unlikely(ioctx->dead)) {
+   spin_unlock_irq(>ctx_lock);
+   ret = -EINVAL;
+   goto out_put_iocb;
+   }
+   aio_run_iocb(iocb);
+   spin_unlock_irq(>ctx_lock);
+
+   aio_put_req(iocb);
+
+   blk_finish_plug();
+   kiocb_batch_free(ioctx, );
+   put_ioctx(ioctx);
+
+   return ret;
+out_put_iocb:
+   aio_put_req(iocb); /* Drop extra ref to req */
+   aio_put_req(iocb); /* Drop I/O ref to req */
+out:
+   put_ioctx(ioctx);
+   return ret;
+}
+


You've duplicated a lot of io_submit_one.  I'd rather see that factored
out than to have to maintain two copies.

Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-17 Thread Jeff Moyer
Asias He  writes:

> vhost-blk is a in kernel virito-blk device accelerator.
>
> This patch is based on Liu Yuan's implementation with various
> improvements and bug fixes. Notably, this patch makes guest notify and
> host completion processing in parallel which gives about 60% performance
> improvement compared to Liu Yuan's implementation.

So, first off, some basic questions.  Is it correct to assume that you
tested this with buffered I/O (files opened *without* O_DIRECT)?  I'm
pretty sure that if you used O_DIRECT, you'd run into problems (which
are solved by the patch set posted by Shaggy, based on Zach Brown's work
of many moons ago).  Note that, with buffered I/O, the submission path
is NOT asynchronous.  So, any speedups you've reported are extremely
suspect.  ;-)

Next, did you look at Shaggy's patch set?  I think it would be best to
focus your efforts on testing *that*, and implementing your work on top
of it.

Having said that, I did do some review of this patch, inlined below.

> +static int vhost_blk_setup(struct vhost_blk *blk)
> +{
> + struct kioctx *ctx;
> +
> + if (blk->ioctx)
> + return 0;
> +
> + blk->ioevent_nr = blk->vq.num;
> + ctx = ioctx_alloc(blk->ioevent_nr);
> + if (IS_ERR(ctx)) {
> + pr_err("Failed to ioctx_alloc");
> + return PTR_ERR(ctx);
> + }
> + put_ioctx(ctx);
> + blk->ioctx = ctx;
> +
> + blk->ioevent = kmalloc(sizeof(struct io_event) * blk->ioevent_nr,
> +GFP_KERNEL);
> + if (!blk->ioevent) {
> + pr_err("Failed to allocate memory for io_events");
> + return -ENOMEM;

You've just leaked blk->ioctx.

> + }
> +
> + blk->reqs = kmalloc(sizeof(struct vhost_blk_req) * blk->ioevent_nr,
> + GFP_KERNEL);
> + if (!blk->reqs) {
> + pr_err("Failed to allocate memory for vhost_blk_req");
> + return -ENOMEM;

And here.

> + }
> +
> + return 0;
> +}
> +
[snip]
> +static int vhost_blk_io_submit(struct vhost_blk *blk, struct file *file,
> +struct vhost_blk_req *req,
> +struct iovec *iov, u64 nr_vecs, loff_t offset,
> +int opcode)
> +{
> + struct kioctx *ioctx = blk->ioctx;
> + mm_segment_t oldfs = get_fs();
> + struct kiocb_batch batch;
> + struct blk_plug plug;
> + struct kiocb *iocb;
> + int ret;
> +
> + if (!try_get_ioctx(ioctx)) {
> + pr_info("Failed to get ioctx");
> + return -EAGAIN;
> + }

Using try_get_ioctx directly gives me a slightly uneasy feeling.  I
understand that you don't need to do the lookup, but at least wrap it
and check for ->dead.

> +
> + atomic_long_inc_not_zero(>f_count);
> + eventfd_ctx_get(blk->ectx);
> +
> + /* TODO: batch to 1 is not good! */

Agreed.  You should setup the batching in vhost_blk_handle_guest_kick.
The way you've written the code, the batching is not at all helpful.

> + kiocb_batch_init(, 1);
> + blk_start_plug();
> +
> + iocb = aio_get_req(ioctx, );
> + if (unlikely(!iocb)) {
> + ret = -EAGAIN;
> + goto out;
> + }
> +
> + iocb->ki_filp   = file;
> + iocb->ki_pos= offset;
> + iocb->ki_buf= (void *)iov;
> + iocb->ki_left   = nr_vecs;
> + iocb->ki_nbytes = nr_vecs;
> + iocb->ki_opcode = opcode;
> + iocb->ki_obj.user = req;
> + iocb->ki_eventfd  = blk->ectx;
> +
> + set_fs(KERNEL_DS);
> + ret = aio_setup_iocb(iocb, false);
> + set_fs(oldfs);
> + if (unlikely(ret))
> + goto out_put_iocb;
> +
> + spin_lock_irq(>ctx_lock);
> + if (unlikely(ioctx->dead)) {
> + spin_unlock_irq(>ctx_lock);
> + ret = -EINVAL;
> + goto out_put_iocb;
> + }
> + aio_run_iocb(iocb);
> + spin_unlock_irq(>ctx_lock);
> +
> + aio_put_req(iocb);
> +
> + blk_finish_plug();
> + kiocb_batch_free(ioctx, );
> + put_ioctx(ioctx);
> +
> + return ret;
> +out_put_iocb:
> + aio_put_req(iocb); /* Drop extra ref to req */
> + aio_put_req(iocb); /* Drop I/O ref to req */
> +out:
> + put_ioctx(ioctx);
> + return ret;
> +}
> +

You've duplicated a lot of io_submit_one.  I'd rather see that factored
out than to have to maintain two copies.

Again, what I'd *really* like to see is you rebase on top of Shaggy's
work.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-17 Thread Jeff Moyer
Asias He as...@redhat.com writes:

 vhost-blk is a in kernel virito-blk device accelerator.

 This patch is based on Liu Yuan's implementation with various
 improvements and bug fixes. Notably, this patch makes guest notify and
 host completion processing in parallel which gives about 60% performance
 improvement compared to Liu Yuan's implementation.

So, first off, some basic questions.  Is it correct to assume that you
tested this with buffered I/O (files opened *without* O_DIRECT)?  I'm
pretty sure that if you used O_DIRECT, you'd run into problems (which
are solved by the patch set posted by Shaggy, based on Zach Brown's work
of many moons ago).  Note that, with buffered I/O, the submission path
is NOT asynchronous.  So, any speedups you've reported are extremely
suspect.  ;-)

Next, did you look at Shaggy's patch set?  I think it would be best to
focus your efforts on testing *that*, and implementing your work on top
of it.

Having said that, I did do some review of this patch, inlined below.

 +static int vhost_blk_setup(struct vhost_blk *blk)
 +{
 + struct kioctx *ctx;
 +
 + if (blk-ioctx)
 + return 0;
 +
 + blk-ioevent_nr = blk-vq.num;
 + ctx = ioctx_alloc(blk-ioevent_nr);
 + if (IS_ERR(ctx)) {
 + pr_err(Failed to ioctx_alloc);
 + return PTR_ERR(ctx);
 + }
 + put_ioctx(ctx);
 + blk-ioctx = ctx;
 +
 + blk-ioevent = kmalloc(sizeof(struct io_event) * blk-ioevent_nr,
 +GFP_KERNEL);
 + if (!blk-ioevent) {
 + pr_err(Failed to allocate memory for io_events);
 + return -ENOMEM;

You've just leaked blk-ioctx.

 + }
 +
 + blk-reqs = kmalloc(sizeof(struct vhost_blk_req) * blk-ioevent_nr,
 + GFP_KERNEL);
 + if (!blk-reqs) {
 + pr_err(Failed to allocate memory for vhost_blk_req);
 + return -ENOMEM;

And here.

 + }
 +
 + return 0;
 +}
 +
[snip]
 +static int vhost_blk_io_submit(struct vhost_blk *blk, struct file *file,
 +struct vhost_blk_req *req,
 +struct iovec *iov, u64 nr_vecs, loff_t offset,
 +int opcode)
 +{
 + struct kioctx *ioctx = blk-ioctx;
 + mm_segment_t oldfs = get_fs();
 + struct kiocb_batch batch;
 + struct blk_plug plug;
 + struct kiocb *iocb;
 + int ret;
 +
 + if (!try_get_ioctx(ioctx)) {
 + pr_info(Failed to get ioctx);
 + return -EAGAIN;
 + }

Using try_get_ioctx directly gives me a slightly uneasy feeling.  I
understand that you don't need to do the lookup, but at least wrap it
and check for -dead.

 +
 + atomic_long_inc_not_zero(file-f_count);
 + eventfd_ctx_get(blk-ectx);
 +
 + /* TODO: batch to 1 is not good! */

Agreed.  You should setup the batching in vhost_blk_handle_guest_kick.
The way you've written the code, the batching is not at all helpful.

 + kiocb_batch_init(batch, 1);
 + blk_start_plug(plug);
 +
 + iocb = aio_get_req(ioctx, batch);
 + if (unlikely(!iocb)) {
 + ret = -EAGAIN;
 + goto out;
 + }
 +
 + iocb-ki_filp   = file;
 + iocb-ki_pos= offset;
 + iocb-ki_buf= (void *)iov;
 + iocb-ki_left   = nr_vecs;
 + iocb-ki_nbytes = nr_vecs;
 + iocb-ki_opcode = opcode;
 + iocb-ki_obj.user = req;
 + iocb-ki_eventfd  = blk-ectx;
 +
 + set_fs(KERNEL_DS);
 + ret = aio_setup_iocb(iocb, false);
 + set_fs(oldfs);
 + if (unlikely(ret))
 + goto out_put_iocb;
 +
 + spin_lock_irq(ioctx-ctx_lock);
 + if (unlikely(ioctx-dead)) {
 + spin_unlock_irq(ioctx-ctx_lock);
 + ret = -EINVAL;
 + goto out_put_iocb;
 + }
 + aio_run_iocb(iocb);
 + spin_unlock_irq(ioctx-ctx_lock);
 +
 + aio_put_req(iocb);
 +
 + blk_finish_plug(plug);
 + kiocb_batch_free(ioctx, batch);
 + put_ioctx(ioctx);
 +
 + return ret;
 +out_put_iocb:
 + aio_put_req(iocb); /* Drop extra ref to req */
 + aio_put_req(iocb); /* Drop I/O ref to req */
 +out:
 + put_ioctx(ioctx);
 + return ret;
 +}
 +

You've duplicated a lot of io_submit_one.  I'd rather see that factored
out than to have to maintain two copies.

Again, what I'd *really* like to see is you rebase on top of Shaggy's
work.

Cheers,
Jeff
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-17 Thread Asias He

On 07/18/2012 03:10 AM, Jeff Moyer wrote:

Asias He as...@redhat.com writes:


vhost-blk is a in kernel virito-blk device accelerator.

This patch is based on Liu Yuan's implementation with various
improvements and bug fixes. Notably, this patch makes guest notify and
host completion processing in parallel which gives about 60% performance
improvement compared to Liu Yuan's implementation.


So, first off, some basic questions.  Is it correct to assume that you
tested this with buffered I/O (files opened *without* O_DIRECT)?
 I'm pretty sure that if you used O_DIRECT, you'd run into problems (which
are solved by the patch set posted by Shaggy, based on Zach Brown's work
of many moons ago).  Note that, with buffered I/O, the submission path
is NOT asynchronous.  So, any speedups you've reported are extremely
suspect.  ;-)


I always used O_DIRECT to test this patchset. And I mostly used raw 
block device as guest image. Is this the reason why I did not hit the 
problem you mentioned. Btw, I do have run this patchset on image based 
file. I still do not see problems like IO hangs.



Next, did you look at Shaggy's patch set?  I think it would be best to
focus your efforts on testing *that*, and implementing your work on top
of it.


I guess you mean this one:

http://marc.info/?l=linux-fsdevelm=133312234313122

I did not notice that until James pointed that out.

I talked with Zach and Shaggy. Shaggy said he is still working on that 
patch set and will send that patch out soon.




Having said that, I did do some review of this patch, inlined below.


Thanks, Jeff!


+static int vhost_blk_setup(struct vhost_blk *blk)
+{
+   struct kioctx *ctx;
+
+   if (blk-ioctx)
+   return 0;
+
+   blk-ioevent_nr = blk-vq.num;
+   ctx = ioctx_alloc(blk-ioevent_nr);
+   if (IS_ERR(ctx)) {
+   pr_err(Failed to ioctx_alloc);
+   return PTR_ERR(ctx);
+   }
+   put_ioctx(ctx);
+   blk-ioctx = ctx;
+
+   blk-ioevent = kmalloc(sizeof(struct io_event) * blk-ioevent_nr,
+  GFP_KERNEL);
+   if (!blk-ioevent) {
+   pr_err(Failed to allocate memory for io_events);
+   return -ENOMEM;


You've just leaked blk-ioctx.


Yes. Will fix.


+   }
+
+   blk-reqs = kmalloc(sizeof(struct vhost_blk_req) * blk-ioevent_nr,
+   GFP_KERNEL);
+   if (!blk-reqs) {
+   pr_err(Failed to allocate memory for vhost_blk_req);
+   return -ENOMEM;


And here.


Yes. Will fix.




+   }
+
+   return 0;
+}
+

[snip]

+static int vhost_blk_io_submit(struct vhost_blk *blk, struct file *file,
+  struct vhost_blk_req *req,
+  struct iovec *iov, u64 nr_vecs, loff_t offset,
+  int opcode)
+{
+   struct kioctx *ioctx = blk-ioctx;
+   mm_segment_t oldfs = get_fs();
+   struct kiocb_batch batch;
+   struct blk_plug plug;
+   struct kiocb *iocb;
+   int ret;
+
+   if (!try_get_ioctx(ioctx)) {
+   pr_info(Failed to get ioctx);
+   return -EAGAIN;
+   }


Using try_get_ioctx directly gives me a slightly uneasy feeling.  I
understand that you don't need to do the lookup, but at least wrap it
and check for -dead.


OK.




+
+   atomic_long_inc_not_zero(file-f_count);
+   eventfd_ctx_get(blk-ectx);
+
+   /* TODO: batch to 1 is not good! */


Agreed.  You should setup the batching in vhost_blk_handle_guest_kick.
The way you've written the code, the batching is not at all helpful.


Yes. that's why there is a TODO.


+   kiocb_batch_init(batch, 1);
+   blk_start_plug(plug);
+
+   iocb = aio_get_req(ioctx, batch);
+   if (unlikely(!iocb)) {
+   ret = -EAGAIN;
+   goto out;
+   }
+
+   iocb-ki_filp= file;
+   iocb-ki_pos = offset;
+   iocb-ki_buf = (void *)iov;
+   iocb-ki_left= nr_vecs;
+   iocb-ki_nbytes  = nr_vecs;
+   iocb-ki_opcode  = opcode;
+   iocb-ki_obj.user = req;
+   iocb-ki_eventfd  = blk-ectx;
+
+   set_fs(KERNEL_DS);
+   ret = aio_setup_iocb(iocb, false);
+   set_fs(oldfs);
+   if (unlikely(ret))
+   goto out_put_iocb;
+
+   spin_lock_irq(ioctx-ctx_lock);
+   if (unlikely(ioctx-dead)) {
+   spin_unlock_irq(ioctx-ctx_lock);
+   ret = -EINVAL;
+   goto out_put_iocb;
+   }
+   aio_run_iocb(iocb);
+   spin_unlock_irq(ioctx-ctx_lock);
+
+   aio_put_req(iocb);
+
+   blk_finish_plug(plug);
+   kiocb_batch_free(ioctx, batch);
+   put_ioctx(ioctx);
+
+   return ret;
+out_put_iocb:
+   aio_put_req(iocb); /* Drop extra ref to req */
+   aio_put_req(iocb); /* Drop I/O ref to req */
+out:
+   put_ioctx(ioctx);
+   return ret;
+}
+


You've duplicated a lot of io_submit_one.  I'd rather see that factored
out than to 

[PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-13 Thread Asias He
vhost-blk is a in kernel virito-blk device accelerator.

This patch is based on Liu Yuan's implementation with various
improvements and bug fixes. Notably, this patch makes guest notify and
host completion processing in parallel which gives about 60% performance
improvement compared to Liu Yuan's implementation.

Performance evaluation:
-
The comparison is between kvm tool with usersapce implementation and kvm
tool with vhost-blk.

1) Fio with libaio ioengine on Fusion IO device
With bio-based IO path, sequential read/write, random read/write
IOPS boost : 8.4%, 15.3%, 10.4%, 14.6%
Latency improvement: 8.5%, 15.4%, 10.4%, 15.1%

2) Fio with vsync ioengine on Fusion IO device
With bio-based IO path, sequential read/write, random read/write
IOPS boost : 10.5%, 4.8%, 5.2%, 5.6%
Latency improvement: 11.4%, 5.0%, 5.2%, 5.8%

Cc: Michael S. Tsirkin 
Cc: linux-kernel@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Asias He 
---
 drivers/vhost/Kconfig  |   10 +
 drivers/vhost/Makefile |2 +
 drivers/vhost/blk.c|  600 
 drivers/vhost/vhost.h  |5 +
 include/linux/vhost.h  |3 +
 5 files changed, 620 insertions(+)
 create mode 100644 drivers/vhost/blk.c

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index c387067..fa071a8 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -16,4 +16,14 @@ config VHOST_NET
 
  To compile this driver as a module, choose M here: the module will
  be called vhost_net.
+config VHOST_BLK
+   tristate "Host kernel accelerator for virtio blk (EXPERIMENTAL)"
+   depends on VHOST && BLOCK && AIO && EVENTFD && EXPERIMENTAL
+   ---help---
+ This kernel module can be loaded in host kernel to accelerate
+ guest block with virtio_blk. Not to be confused with virtio_blk
+ module itself which needs to be loaded in guest kernel.
+
+ To compile this driver as a module, choose M here: the module will
+ be called vhost_blk.
 
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index cd36885..aa461d5 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -1,4 +1,6 @@
 obj-$(CONFIG_VHOST)+= vhost.o
 obj-$(CONFIG_VHOST_NET) += vhost_net.o
+obj-$(CONFIG_VHOST_BLK) += vhost_blk.o
 
 vhost_net-y:= net.o
+vhost_blk-y:= blk.o
diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c
new file mode 100644
index 000..6a94894
--- /dev/null
+++ b/drivers/vhost/blk.c
@@ -0,0 +1,600 @@
+/*
+ * Copyright (C) 2011 Taobao, Inc.
+ * Author: Liu Yuan 
+ *
+ * Copyright (C) 2012 Red Hat, Inc.
+ * Author: Asias He 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ *
+ * virtio-blk server in host kernel.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vhost.h"
+
+#define BLK_HDR0
+
+enum {
+   VHOST_BLK_VQ_REQ = 0,
+   VHOST_BLK_VQ_MAX = 1,
+};
+
+struct vhost_blk_req {
+   u16 head;
+   u8 *status;
+};
+
+struct vhost_blk {
+   struct task_struct *worker_host_kick;
+   struct task_struct *worker;
+   struct vhost_blk_req *reqs;
+   struct vhost_virtqueue vq;
+   struct eventfd_ctx *ectx;
+   struct io_event *ioevent;
+   struct kioctx *ioctx;
+   struct vhost_dev dev;
+   struct file *efile;
+   u64 ioevent_nr;
+   bool stop;
+};
+
+static inline int vhost_blk_read_events(struct vhost_blk *blk, long nr)
+{
+   mm_segment_t old_fs = get_fs();
+   int ret;
+
+   set_fs(KERNEL_DS);
+   ret = read_events(blk->ioctx, nr, nr, blk->ioevent, NULL);
+   set_fs(old_fs);
+
+   return ret;
+}
+
+static int vhost_blk_setup(struct vhost_blk *blk)
+{
+   struct kioctx *ctx;
+
+   if (blk->ioctx)
+   return 0;
+
+   blk->ioevent_nr = blk->vq.num;
+   ctx = ioctx_alloc(blk->ioevent_nr);
+   if (IS_ERR(ctx)) {
+   pr_err("Failed to ioctx_alloc");
+   return PTR_ERR(ctx);
+   }
+   put_ioctx(ctx);
+   blk->ioctx = ctx;
+
+   blk->ioevent = kmalloc(sizeof(struct io_event) * blk->ioevent_nr,
+  GFP_KERNEL);
+   if (!blk->ioevent) {
+   pr_err("Failed to allocate memory for io_events");
+   return -ENOMEM;
+   }
+
+   blk->reqs = kmalloc(sizeof(struct vhost_blk_req) * blk->ioevent_nr,
+   GFP_KERNEL);
+   if (!blk->reqs) {
+   pr_err("Failed to allocate memory for vhost_blk_req");
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
+static inline int vhost_blk_set_status(struct vhost_blk *blk, u8 *statusp,
+  u8 status)
+{
+   if (copy_to_user(statusp, , sizeof(status))) {
+   vq_err(>vq, "Failed to 

[PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-13 Thread Asias He
vhost-blk is a in kernel virito-blk device accelerator.

This patch is based on Liu Yuan's implementation with various
improvements and bug fixes. Notably, this patch makes guest notify and
host completion processing in parallel which gives about 60% performance
improvement compared to Liu Yuan's implementation.

Performance evaluation:
-
The comparison is between kvm tool with usersapce implementation and kvm
tool with vhost-blk.

1) Fio with libaio ioengine on Fusion IO device
With bio-based IO path, sequential read/write, random read/write
IOPS boost : 8.4%, 15.3%, 10.4%, 14.6%
Latency improvement: 8.5%, 15.4%, 10.4%, 15.1%

2) Fio with vsync ioengine on Fusion IO device
With bio-based IO path, sequential read/write, random read/write
IOPS boost : 10.5%, 4.8%, 5.2%, 5.6%
Latency improvement: 11.4%, 5.0%, 5.2%, 5.8%

Cc: Michael S. Tsirkin m...@redhat.com
Cc: linux-kernel@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Asias He as...@redhat.com
---
 drivers/vhost/Kconfig  |   10 +
 drivers/vhost/Makefile |2 +
 drivers/vhost/blk.c|  600 
 drivers/vhost/vhost.h  |5 +
 include/linux/vhost.h  |3 +
 5 files changed, 620 insertions(+)
 create mode 100644 drivers/vhost/blk.c

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index c387067..fa071a8 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -16,4 +16,14 @@ config VHOST_NET
 
  To compile this driver as a module, choose M here: the module will
  be called vhost_net.
+config VHOST_BLK
+   tristate Host kernel accelerator for virtio blk (EXPERIMENTAL)
+   depends on VHOST  BLOCK  AIO  EVENTFD  EXPERIMENTAL
+   ---help---
+ This kernel module can be loaded in host kernel to accelerate
+ guest block with virtio_blk. Not to be confused with virtio_blk
+ module itself which needs to be loaded in guest kernel.
+
+ To compile this driver as a module, choose M here: the module will
+ be called vhost_blk.
 
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index cd36885..aa461d5 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -1,4 +1,6 @@
 obj-$(CONFIG_VHOST)+= vhost.o
 obj-$(CONFIG_VHOST_NET) += vhost_net.o
+obj-$(CONFIG_VHOST_BLK) += vhost_blk.o
 
 vhost_net-y:= net.o
+vhost_blk-y:= blk.o
diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c
new file mode 100644
index 000..6a94894
--- /dev/null
+++ b/drivers/vhost/blk.c
@@ -0,0 +1,600 @@
+/*
+ * Copyright (C) 2011 Taobao, Inc.
+ * Author: Liu Yuan tailai...@taobao.com
+ *
+ * Copyright (C) 2012 Red Hat, Inc.
+ * Author: Asias He as...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ *
+ * virtio-blk server in host kernel.
+ */
+
+#include linux/miscdevice.h
+#include linux/module.h
+#include linux/vhost.h
+#include linux/virtio_blk.h
+#include linux/eventfd.h
+#include linux/mutex.h
+#include linux/file.h
+#include linux/mmu_context.h
+#include linux/anon_inodes.h
+#include linux/kthread.h
+#include linux/blkdev.h
+
+#include vhost.h
+
+#define BLK_HDR0
+
+enum {
+   VHOST_BLK_VQ_REQ = 0,
+   VHOST_BLK_VQ_MAX = 1,
+};
+
+struct vhost_blk_req {
+   u16 head;
+   u8 *status;
+};
+
+struct vhost_blk {
+   struct task_struct *worker_host_kick;
+   struct task_struct *worker;
+   struct vhost_blk_req *reqs;
+   struct vhost_virtqueue vq;
+   struct eventfd_ctx *ectx;
+   struct io_event *ioevent;
+   struct kioctx *ioctx;
+   struct vhost_dev dev;
+   struct file *efile;
+   u64 ioevent_nr;
+   bool stop;
+};
+
+static inline int vhost_blk_read_events(struct vhost_blk *blk, long nr)
+{
+   mm_segment_t old_fs = get_fs();
+   int ret;
+
+   set_fs(KERNEL_DS);
+   ret = read_events(blk-ioctx, nr, nr, blk-ioevent, NULL);
+   set_fs(old_fs);
+
+   return ret;
+}
+
+static int vhost_blk_setup(struct vhost_blk *blk)
+{
+   struct kioctx *ctx;
+
+   if (blk-ioctx)
+   return 0;
+
+   blk-ioevent_nr = blk-vq.num;
+   ctx = ioctx_alloc(blk-ioevent_nr);
+   if (IS_ERR(ctx)) {
+   pr_err(Failed to ioctx_alloc);
+   return PTR_ERR(ctx);
+   }
+   put_ioctx(ctx);
+   blk-ioctx = ctx;
+
+   blk-ioevent = kmalloc(sizeof(struct io_event) * blk-ioevent_nr,
+  GFP_KERNEL);
+   if (!blk-ioevent) {
+   pr_err(Failed to allocate memory for io_events);
+   return -ENOMEM;
+   }
+
+   blk-reqs = kmalloc(sizeof(struct vhost_blk_req) * blk-ioevent_nr,
+   GFP_KERNEL);
+   if (!blk-reqs) {
+   pr_err(Failed to allocate memory for vhost_blk_req);
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
+static inline int