Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT
Anthony Liguori wrote: This is just a first cut. It needs a fair bit of cleanup before it can be committed. I also think we need to fixup the AIO abstractions a bit. I wanted to share though in case anyone is interested in doing some performance comparisons. It seems to work although I haven't exercised it very much. +typedef struct AIOOperations +{ +struct qemu_aiocb *(*get_aiocb)(void); +void (*put_aiocb)(struct qemu_aiocb *); +int (*read)(struct qemu_aiocb *); +int (*write)(struct qemu_aiocb *); +int (*error)(struct qemu_aiocb *); +ssize_t (*get_result)(struct qemu_aiocb *aiocb); +int (*cancel)(int fd, struct qemu_aiocb *aiocb); +} AIOOperations; + Instead of introducing yet another layer of indirection, you could add block-raw-linux-aio, which would be registered before block-raw-posix (which is realy block-raw-threadpool...), and resist a -probe() if caching is enabled. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT
Avi Kivity wrote: Instead of introducing yet another layer of indirection, you could add block-raw-linux-aio, which would be registered before block-raw-posix (which is realy block-raw-threadpool...), and resist a -probe() if caching is enabled. block-raw-posix needs a major overhaul. That's why I'm not even considering committing the patch as is. I'd like to see the O_DIRECT bounce buffering removed in favor of the DMA API bouncing. Once that happens, raw_read and raw_pread can disappear. block-raw-posix becomes much simpler. We would drop the signaling stuff and have the thread pool use an fd to signal. The big problem with that right now is that it'll cause a performance regression for certain platforms until we have the IO thread in place. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT
On Mon, Mar 23, 2009 at 06:17:36PM +0200, Avi Kivity wrote: Instead of introducing yet another layer of indirection, you could add block-raw-linux-aio, which would be registered before block-raw-posix (which is realy block-raw-threadpool...), and resist a -probe() if caching is enabled. Exactly the kind of comment I was about to make, but I need to read a little deeper to understand all the details. But my gut feeling is that this abstraction doesn't help us very much, especially with Avi's aiocb pools in place. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT
On Mon, Mar 23, 2009 at 12:14:58PM -0500, Anthony Liguori wrote: I'd like to see the O_DIRECT bounce buffering removed in favor of the DMA API bouncing. Once that happens, raw_read and raw_pread can disappear. block-raw-posix becomes much simpler. See my vectored I/O patches for doing the bounce buffering at the optimal place for the aio path. Note that from my reading of the qcow/qcow2 code they might send down unaligned requests, which is something the dma api would not help with. For the buffered I/O path we will always have to do some sort of buffering due to all the partition header reading / etc. And given how that part isn't performance critical my preference would be to keep doing it in bdrv_pread/write and guarantee the lowlevel drivers proper alignment. We would drop the signaling stuff and have the thread pool use an fd to signal. The big problem with that right now is that it'll cause a performance regression for certain platforms until we have the IO thread in place. Talking about signaling, does anyone remember why the Linux signalfd/ eventfd support is only in kvm but not in upstream qemu? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT
On Mon, Mar 23, 2009 at 12:14:58PM -0500, Anthony Liguori wrote: block-raw-posix needs a major overhaul. That's why I'm not even considering committing the patch as is. I have some WIP patches that split out the host device bits into separate files to get block-raw-posix down to the pure file handling bits without all the host-specific host device mess. But it's at the end of a really large pile, which needs to be rebases once we have the patches already on the list in in some form. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT
Christoph Hellwig wrote: On Mon, Mar 23, 2009 at 12:14:58PM -0500, Anthony Liguori wrote: I'd like to see the O_DIRECT bounce buffering removed in favor of the DMA API bouncing. Once that happens, raw_read and raw_pread can disappear. block-raw-posix becomes much simpler. See my vectored I/O patches for doing the bounce buffering at the optimal place for the aio path. Note that from my reading of the qcow/qcow2 code they might send down unaligned requests, which is something the dma api would not help with. I was going to look today at applying those. For the buffered I/O path we will always have to do some sort of buffering due to all the partition header reading / etc. And given how that part isn't performance critical my preference would be to keep doing it in bdrv_pread/write and guarantee the lowlevel drivers proper alignment. I really dislike having so many APIs. I'd rather have an aio API that took byte accesses or have pread/pwrite always be emulated with a full sector read/write We would drop the signaling stuff and have the thread pool use an fd to signal. The big problem with that right now is that it'll cause a performance regression for certain platforms until we have the IO thread in place. Talking about signaling, does anyone remember why the Linux signalfd/ eventfd support is only in kvm but not in upstream qemu? Because upstream QEMU doesn't yet have an IO thread. TCG chains together TBs and if you have a tight loop in a VCPU, then the only way to break out of the loop is to receive a signal. The signal handler will call cpu_interrupt() which will unchain TBs allowing TCG execution to break once you return from the signal handler. An IO thread solves this in a different way by letting select() always run in parallel to TCG VCPU execution. When select() returns you can send a signal to the TCG VCPU thread to break it out of chained TBs. Not all IO in qemu generates a signal so this a potential problem but in practice, if we don't generate a signal for disk IO completion, a number of real world guests breaks (mostly non-x86 boards). Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT
On Mon, Mar 23, 2009 at 01:10:30PM -0500, Anthony Liguori wrote: I really dislike having so many APIs. I'd rather have an aio API that took byte accesses or have pread/pwrite always be emulated with a full sector read/write I had patches to change the aio API to byte based access, and get rid of the read/write methods to only have the byte based pread/pwrite APIs, but thay got obsoleted by Avi's patch to kill the pread/pwrite ops. We could put in byte-based AIO without byte-based read/write, though. In my patches I put a flag into BlockDriverState whether we allow byte-based access to this instance or otherwise emulated it in the block layer. We still need this as many of the image formats can't deal with byte-granularity access without read-modify-write cycles, and I think we're better off having one read-modify-write handler in the block handler than one per image format that needs it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT
Christoph Hellwig wrote: On Mon, Mar 23, 2009 at 01:10:30PM -0500, Anthony Liguori wrote: I really dislike having so many APIs. I'd rather have an aio API that took byte accesses or have pread/pwrite always be emulated with a full sector read/write I had patches to change the aio API to byte based access, and get rid of the read/write methods to only have the byte based pread/pwrite APIs, but thay got obsoleted by Avi's patch to kill the pread/pwrite ops. We could put in byte-based AIO without byte-based read/write, though. In my patches I put a flag into BlockDriverState whether we allow byte-based access to this instance or otherwise emulated it in the block layer. I like this approach. An additional flag could tell us what buffer alignment the format driver wants, so we can eliminate the alignment bounce from format driver code. Oh, and a flag to indicate we don't support vectors, so the generic layer will bounce and send us a length one iovec. Note the align flag is in the device state, not the format driver, as it depends on the cache= settings. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT
Anthony Liguori wrote: Avi Kivity wrote: Instead of introducing yet another layer of indirection, you could add block-raw-linux-aio, which would be registered before block-raw-posix (which is realy block-raw-threadpool...), and resist a -probe() if caching is enabled. block-raw-posix needs a major overhaul. That's why I'm not even considering committing the patch as is. That would suggest block-raw-linux-aio-bork-bork-bork.c even more, no? I'd like to see the O_DIRECT bounce buffering removed in favor of the DMA API bouncing. Once that happens, raw_read and raw_pread can disappear. block-raw-posix becomes much simpler. They aren't really related... note that DMA API requests are likely to be aligned anyway, since the guest generates them with the expectation that alignent is required. We need to align at a lower level so we can take care of non-dma-api callers (mostly qemu internal). We would drop the signaling stuff and have the thread pool use an fd to signal. The big problem with that right now is that it'll cause a performance regression for certain platforms until we have the IO thread in place. Well, let's merge this after the iothread? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT
Avi Kivity wrote: We would drop the signaling stuff and have the thread pool use an fd to signal. The big problem with that right now is that it'll cause a performance regression for certain platforms until we have the IO thread in place. Well, let's merge this after the iothread? Yup. Just posted that patch in case anyone was interested. I needed it so that we could do some performance testing... Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html