Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT

2009-03-23 Thread Avi Kivity

Anthony Liguori wrote:

This is just a first cut.  It needs a fair bit of cleanup before it can be
committed.  I also think we need to fixup the AIO abstractions a bit.

I wanted to share though in case anyone is interested in doing some performance
comparisons.  It seems to work although I haven't exercised it very much.

 
+typedef struct AIOOperations

+{
+struct qemu_aiocb *(*get_aiocb)(void);
+void (*put_aiocb)(struct qemu_aiocb *);
+int (*read)(struct qemu_aiocb *);
+int (*write)(struct qemu_aiocb *);
+int (*error)(struct qemu_aiocb *);
+ssize_t (*get_result)(struct qemu_aiocb *aiocb);
+int (*cancel)(int fd, struct qemu_aiocb *aiocb);
+} AIOOperations;
+
  



Instead of introducing yet another layer of indirection, you could add 
block-raw-linux-aio, which would be registered before block-raw-posix 
(which is realy block-raw-threadpool...), and resist a -probe() if 
caching is enabled.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT

2009-03-23 Thread Anthony Liguori

Avi Kivity wrote:


Instead of introducing yet another layer of indirection, you could add 
block-raw-linux-aio, which would be registered before block-raw-posix 
(which is realy block-raw-threadpool...), and resist a -probe() if 
caching is enabled.


block-raw-posix needs a major overhaul.  That's why I'm not even 
considering committing the patch as is.


I'd like to see the O_DIRECT bounce buffering removed in favor of the 
DMA API bouncing.  Once that happens, raw_read and raw_pread can 
disappear.  block-raw-posix becomes much simpler.


We would drop the signaling stuff and have the thread pool use an fd to 
signal.  The big problem with that right now is that it'll cause a 
performance regression for certain platforms until we have the IO thread 
in place.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT

2009-03-23 Thread Christoph Hellwig
On Mon, Mar 23, 2009 at 06:17:36PM +0200, Avi Kivity wrote:
 Instead of introducing yet another layer of indirection, you could add  
 block-raw-linux-aio, which would be registered before block-raw-posix  
 (which is realy block-raw-threadpool...), and resist a -probe() if  
 caching is enabled.

Exactly the kind of comment I was about to make, but I need to read a
little deeper to understand all the details.

But my gut feeling is that this abstraction doesn't help us very much,
especially with Avi's aiocb pools in place.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT

2009-03-23 Thread Christoph Hellwig
On Mon, Mar 23, 2009 at 12:14:58PM -0500, Anthony Liguori wrote:
 I'd like to see the O_DIRECT bounce buffering removed in favor of the  
 DMA API bouncing.  Once that happens, raw_read and raw_pread can  
 disappear.  block-raw-posix becomes much simpler.

See my vectored I/O patches for doing the bounce buffering at the
optimal place for the aio path. Note that from my reading of the
qcow/qcow2 code they might send down unaligned requests, which is
something the dma api would not help with.

For the buffered I/O path we will always have to do some sort of buffering
due to all the partition header reading / etc.  And given how that part
isn't performance critical my preference would be to keep doing it in
bdrv_pread/write and guarantee the lowlevel drivers proper alignment.

 We would drop the signaling stuff and have the thread pool use an fd to  
 signal.  The big problem with that right now is that it'll cause a  
 performance regression for certain platforms until we have the IO thread  
 in place.

Talking about signaling, does anyone remember why the Linux signalfd/
eventfd support is only in kvm but not in upstream qemu?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT

2009-03-23 Thread Christoph Hellwig
On Mon, Mar 23, 2009 at 12:14:58PM -0500, Anthony Liguori wrote:
 block-raw-posix needs a major overhaul.  That's why I'm not even  
 considering committing the patch as is.

I have some WIP patches that split out the host device bits into
separate files to get block-raw-posix down to the pure file handling
bits without all the host-specific host device mess.  But it's at the
end of a really large pile, which needs to be rebases once we have the
patches already on the list in in some form.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT

2009-03-23 Thread Anthony Liguori

Christoph Hellwig wrote:

On Mon, Mar 23, 2009 at 12:14:58PM -0500, Anthony Liguori wrote:
  
I'd like to see the O_DIRECT bounce buffering removed in favor of the  
DMA API bouncing.  Once that happens, raw_read and raw_pread can  
disappear.  block-raw-posix becomes much simpler.



See my vectored I/O patches for doing the bounce buffering at the
optimal place for the aio path. Note that from my reading of the
qcow/qcow2 code they might send down unaligned requests, which is
something the dma api would not help with.
  


I was going to look today at applying those.


For the buffered I/O path we will always have to do some sort of buffering
due to all the partition header reading / etc.  And given how that part
isn't performance critical my preference would be to keep doing it in
bdrv_pread/write and guarantee the lowlevel drivers proper alignment.
  


I really dislike having so many APIs.  I'd rather have an aio API that 
took byte accesses or have pread/pwrite always be emulated with a full 
sector read/write


We would drop the signaling stuff and have the thread pool use an fd to  
signal.  The big problem with that right now is that it'll cause a  
performance regression for certain platforms until we have the IO thread  
in place.



Talking about signaling, does anyone remember why the Linux signalfd/
eventfd support is only in kvm but not in upstream qemu?
  


Because upstream QEMU doesn't yet have an IO thread.

TCG chains together TBs and if you have a tight loop in a VCPU, then the 
only way to break out of the loop is to receive a signal.  The signal 
handler will call cpu_interrupt() which will unchain TBs allowing TCG 
execution to break once you return from the signal handler.


An IO thread solves this in a different way by letting select() always 
run in parallel to TCG VCPU execution.  When select() returns you can 
send a signal to the TCG VCPU thread to break it out of chained TBs.


Not all IO in qemu generates a signal so this a potential problem but in 
practice, if we don't generate a signal for disk IO completion, a number 
of real world guests breaks (mostly non-x86 boards).


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT

2009-03-23 Thread Christoph Hellwig
On Mon, Mar 23, 2009 at 01:10:30PM -0500, Anthony Liguori wrote:
 I really dislike having so many APIs.  I'd rather have an aio API that 
 took byte accesses or have pread/pwrite always be emulated with a full 
 sector read/write

I had patches to change the aio API to byte based access, and get rid
of the read/write methods to only have the byte based pread/pwrite
APIs, but thay got obsoleted by Avi's patch to kill the pread/pwrite
ops.  We could put in byte-based AIO without byte-based read/write,
though.  In my patches I put a flag into BlockDriverState whether we
allow byte-based access to this instance or otherwise emulated it in
the block layer.  We still need this as many of the image formats can't
deal with byte-granularity access without read-modify-write cycles,
and I think we're better off having one read-modify-write handler in
the block handler than one per image format that needs it.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT

2009-03-23 Thread Avi Kivity

Christoph Hellwig wrote:

On Mon, Mar 23, 2009 at 01:10:30PM -0500, Anthony Liguori wrote:
  
I really dislike having so many APIs.  I'd rather have an aio API that 
took byte accesses or have pread/pwrite always be emulated with a full 
sector read/write



I had patches to change the aio API to byte based access, and get rid
of the read/write methods to only have the byte based pread/pwrite
APIs, but thay got obsoleted by Avi's patch to kill the pread/pwrite
ops.  We could put in byte-based AIO without byte-based read/write,
though.  In my patches I put a flag into BlockDriverState whether we
allow byte-based access to this instance or otherwise emulated it in
the block layer.  


I like this approach.  An additional flag could tell us what buffer 
alignment the format driver wants, so we can eliminate the alignment 
bounce from format driver code.  Oh, and a flag to indicate we don't 
support vectors, so the generic layer will bounce and send us a length 
one iovec.


Note the align flag is in the device state, not the format driver, as it 
depends on the cache= settings.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT

2009-03-23 Thread Avi Kivity

Anthony Liguori wrote:

Avi Kivity wrote:


Instead of introducing yet another layer of indirection, you could 
add block-raw-linux-aio, which would be registered before 
block-raw-posix (which is realy block-raw-threadpool...), and resist 
a -probe() if caching is enabled.


block-raw-posix needs a major overhaul.  That's why I'm not even 
considering committing the patch as is.


That would suggest block-raw-linux-aio-bork-bork-bork.c even more, no?



I'd like to see the O_DIRECT bounce buffering removed in favor of the 
DMA API bouncing.  Once that happens, raw_read and raw_pread can 
disappear.  block-raw-posix becomes much simpler.


They aren't really related... note that DMA API requests are likely to 
be aligned anyway, since the guest generates them with the expectation 
that alignent is required.  We need to align at a lower level so we can 
take care of non-dma-api callers (mostly qemu internal).




We would drop the signaling stuff and have the thread pool use an fd 
to signal.  The big problem with that right now is that it'll cause a 
performance regression for certain platforms until we have the IO 
thread in place. 


Well, let's merge this after the iothread?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT

2009-03-23 Thread Anthony Liguori

Avi Kivity wrote:




We would drop the signaling stuff and have the thread pool use an fd 
to signal.  The big problem with that right now is that it'll cause a 
performance regression for certain platforms until we have the IO 
thread in place. 


Well, let's merge this after the iothread?


Yup.  Just posted that patch in case anyone was interested.  I needed it 
so that we could do some performance testing...


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html