Re: [RFC] vhost-blk implementation

2010-03-24 Thread Christoph Hellwig
Inspired by vhost-net implementation, I did initial prototype of vhost-blk to see if it provides any benefits over QEMU virtio-blk. I haven't handled all the error cases, fixed naming conventions etc., but the implementation is stable to play with. I tried not to deviate from vhost-net

Re: [RFC] vhost-blk implementation

2010-03-24 Thread Christoph Hellwig
On Tue, Mar 23, 2010 at 12:03:14PM +0200, Avi Kivity wrote: I also think it should be done at the bio layer. File I/O is going to be slower, if we do vhost-blk we should concentrate on maximum performance. The block layer also exposes more functionality we can use (asynchronous

Re: [RFC] vhost-blk implementation

2010-03-25 Thread Christoph Hellwig
On Thu, Mar 25, 2010 at 08:29:03AM +0200, Avi Kivity wrote: We still have a virtio implementation in userspace for file-based images. In any case, the file APIs are not asynchronous so we'll need a thread pool. That will probably minimize the difference in performance between the

Re: [RFC] vhost-blk implementation

2010-03-25 Thread Christoph Hellwig
On Wed, Mar 24, 2010 at 01:22:37PM -0700, Badari Pulavarty wrote: Yes. This is with default (writeback) cache model. As mentioned earlier, readhead is helping here and most cases, data would be ready in the pagecache. Ok. cache=writeback performance is something I haven't bothered looking

Re: [RFC] vhost-blk implementation

2010-04-05 Thread Christoph Hellwig
that the actual buffers are userspace pointers, but the iovecs in the virtqueue are kernel level pointers, so you would need some annotations. While we're at it here is a patch fixing the remaining sparse warnings in vhost-blk: Signed-off-by: Christoph Hellwig h...@lst.de Index: linux-2.6/drivers/vhost

[PATCH] vhost: fix sparse warnings

2010-04-05 Thread Christoph Hellwig
Index: linux-2.6/drivers/vhost/net.c === --- linux-2.6.orig/drivers/vhost/net.c 2010-04-05 21:13:24.196004388 +0200 +++ linux-2.6/drivers/vhost/net.c 2010-04-05 21:13:32.726004109 +0200 @@ -641,7 +641,7 @@ static struct

Re: [RFC] vhost-blk implementation

2010-04-05 Thread Christoph Hellwig
On Thu, Mar 25, 2010 at 04:00:56PM +0100, Asdo wrote: Would the loop device provide the features of a block device? I recall barrier support at least has been added recently. It does, but not in a very efficient way. Is it recommended to run kvm on a loopback mounted file compared to on a

Re: Shouldn't cache=none be the default for drives?

2010-04-08 Thread Christoph Hellwig
On Thu, Apr 08, 2010 at 10:05:09AM +0400, Michael Tokarev wrote: LVM volumes. This is because with cache=none, the virtual disk image is opened with O_DIRECT flag, which means all I/O bypasses host scheduler and buffer cache. O_DIRECT does not bypass the I/O scheduler, only the page cache.

Re: [PATCH] vhost: fix sparse warnings

2010-04-13 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig h...@lst.de Index: linux-2.6/drivers/vhost/net.c === --- linux-2.6.orig/drivers/vhost/net.c 2010-04-05 21:13:24.196004388 +0200 +++ linux-2.6/drivers/vhost/net.c 2010-04-05 21:13:32.726004109

Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Christoph Hellwig
On Tue, May 04, 2010 at 02:08:24PM +0930, Rusty Russell wrote: On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote: I took a stub at documenting CMD and FLUSH request types in virtio block. Christoph, could you look over this please? I note that the interface seems full of warts to

Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath - Assertion

2010-05-04 Thread Christoph Hellwig
bdrv_aio_cancel has returned. In fact we cannot cancel a request more often than we can, so there's a fairly high chance it will complete. Reviewed-by: Christoph Hellwig h...@lst.de -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord

Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Christoph Hellwig
On Fri, Feb 19, 2010 at 12:22:20AM +0200, Michael S. Tsirkin wrote: I took a stub at documenting CMD and FLUSH request types in virtio block. Christoph, could you look over this please? I note that the interface seems full of warts to me, this might be a first step to cleaning them. The

Re: [Qemu-devel] [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Christoph Hellwig
On Tue, Apr 20, 2010 at 02:46:35AM +0100, Jamie Lokier wrote: Does this mean that virtio-blk supports all three combinations? 1. FLUSH that isn't a barrier 2. FLUSH that is also a barrier 3. Barrier that is not a flush 1 is good for fsync-like operations; 2 is good for

Re: [Qemu-devel] [RFC PATCH 1/2] close all the block drivers before the qemu process exits

2010-05-12 Thread Christoph Hellwig
On Wed, May 12, 2010 at 07:46:52PM +0900, MORITA Kazutaka wrote: This patch calls the close handler of the block driver before the qemu process exits. This is necessary because the sheepdog block driver releases the lock of VM images in the close handler. Signed-off-by: MORITA Kazutaka

Re: [PATCH 1/2] [block]: Fix scsi-generic breakage in find_image_format()

2010-05-16 Thread Christoph Hellwig
On Sat, May 15, 2010 at 06:30:52AM -0700, Nicholas A. Bellinger wrote: From: Nicholas Bellinger n...@linux-iscsi.org This patch adds a special BlockDriverState-sg check in block.c:find_image_format() after bdrv_file_open() - block/raw-posix.c:hdev_open() has been called to determine if

Re: [PATCH 2/2] [block]: Skip refresh_total_sectors() for scsi-generic devices

2010-05-16 Thread Christoph Hellwig
On Sat, May 15, 2010 at 06:30:59AM -0700, Nicholas A. Bellinger wrote: From: Nicholas Bellinger n...@linux-iscsi.org This patch adds a BlockDriverState-sg check in block.c:bdrv_common_open() to skip the new refresh_total_sectors() call once we know we are working with a scsi-generic device.

Re: qemu-kvm hangs if multipath device is queing

2010-05-19 Thread Christoph Hellwig
On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote: I think it's stuck here in an endless loop: while (laiocb-ret == -EINPROGRESS) qemu_laio_completion_cb(laiocb-ctx); Can you verify this by single-stepping one or two loop iterations? ret and errno after the read call

Re: KVM call agenda for May 18

2010-05-19 Thread Christoph Hellwig
On Tue, May 18, 2010 at 08:52:36AM -0500, Anthony Liguori wrote: This should be filed in launchpad as a qemu bug and it should be tested against the latest git. This bug sounds like we're using an int to represent sector offset somewhere but there's not enough info in the bug report to

Re: the 1Tb block issue

2010-05-19 Thread Christoph Hellwig
On Tue, May 18, 2010 at 08:38:22PM +0300, Avi Kivity wrote: Yes. Why would Linux post overlapping requests? makes 0x sense. There may be a guest bug in here too. Christoph? Overlapping writes are entirely fine from the guest POV, although they should be rather unusual. We

Re: [PATCH +stable] block: don't attempt to merge overlapping requests

2010-05-19 Thread Christoph Hellwig
On Wed, May 19, 2010 at 10:23:44AM +0100, Stefan Hajnoczi wrote: On Wed, May 19, 2010 at 10:06 AM, Avi Kivity a...@redhat.com wrote: In the cache=writeback case the virtio-blk guest driver does: blk_queue_ordered(q, QUEUE_ORDERED_DRAIN_FLUSH, ...) I don't follow. ?What's the

Re: [Qemu-devel] [PATCH 1/2] trace: Add simple tracing support

2010-05-21 Thread Christoph Hellwig
On Fri, May 21, 2010 at 09:49:56PM +0100, Stefan Hajnoczi wrote: http://sourceware.org/systemtap/wiki/AddingUserSpaceProbingToApps Requires kernel support - not sure if enough of utrace is in mainline for this to work out-of-the-box across distros. Nothing of utrace is in mainline, nevermind

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Christoph Hellwig
On Tue, May 25, 2010 at 02:25:53PM +0300, Avi Kivity wrote: Currently if someone wants to add a new block format, they have to upstream it and wait for a new qemu to be released. With a plugin API, they can add a new block format to an existing, supported qemu. So? Unless we want a

Re: raw disks no longer work in latest kvm (kvm-88 was fine)

2010-05-29 Thread Christoph Hellwig
On Sat, May 29, 2010 at 04:42:59PM +0700, Antoine Martin wrote: Can someone explain the aio options? All I can find is this: # qemu-system-x86_64 -h | grep -i aio [,addr=A][,id=name][,aio=threads|native] I assume it means the aio=threads emulates the kernel's aio with separate

Re: raw disks no longer work in latest kvm (kvm-88 was fine)

2010-05-29 Thread Christoph Hellwig
On Sat, May 29, 2010 at 10:55:18AM +0100, Stefan Hajnoczi wrote: I would expect that aio=native is faster but benchmarks show that this isn't true for all workloads. In what benchmark do you see worse results for aio=native compared to aio=threads? -- To unsubscribe from this list: send the

Re: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default

2009-09-17 Thread Christoph Hellwig
Err, I'll take this one back for now pending some more discussion. What we need more urgently is the writeback cache flag, which is now implemented in qemu, patch following ASAP. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org

Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-09-22 Thread Christoph Hellwig
Btw, what's the state of getting compatfd upstream? It's a pretty annoying difference between qemu upstream and qemu-kvm. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at

Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-09-22 Thread Christoph Hellwig
On Tue, Sep 22, 2009 at 03:25:13PM +0200, Juan Quintela wrote: Christoph Hellwig h...@infradead.org wrote: Btw, what's the state of getting compatfd upstream? It's a pretty annoying difference between qemu upstream and qemu-kvm. I haven't tried. I can try to send a patch. Do you have

Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream

2009-09-30 Thread Christoph Hellwig
I might sound like a broken record, but why isn't the full GSO support for virtio-net upstream in qemu? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: kvm tuning guide

2009-09-30 Thread Christoph Hellwig
On Wed, Sep 30, 2009 at 08:20:35AM +0200, Avi Kivity wrote: On 09/30/2009 07:09 AM, Nikola Ciprich wrote: The default, IDE, is highly supported by guests but may be slow, especially with disk arrays. If your guest supports it, use the virtio interface: Avi, what is the status of data

Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-10-01 Thread Christoph Hellwig
On Thu, Oct 01, 2009 at 01:58:10PM +0200, Juan Quintela wrote: Discused with Anthony about it. signalfd is complicated for qemu upstream (too difficult to use properly), and eventfd ... The current eventfd emulation is worse than the pipe code that it substitutes. His suggestion here was

Re: sync guest calls made async on host - SQLite performance

2009-10-13 Thread Christoph Hellwig
On Sun, Oct 11, 2009 at 11:16:42AM +0200, Avi Kivity wrote: if scsi is used, you incur the cost of virtualization, if virtio is used, your guests fsyncs incur less cost. So back to the question to the kvm team. It appears that with the stock KVM setup customers who need higher data

Re: sync guest calls made async on host - SQLite performance

2009-10-14 Thread Christoph Hellwig
On Wed, Oct 14, 2009 at 08:03:41PM +0900, Avi Kivity wrote: Can't remember anything like that. The bug was the complete lack of cache flush infrastructure for virtio, and the lack of advertising a volative write cache on ide. By complete flush infrastructure, you mean host-side and

Re: [PATCH] virtio-blk: fallback to draining the queue if barrier ops are not supported

2009-10-14 Thread Christoph Hellwig
On Wed, Oct 14, 2009 at 07:38:45PM +0400, Michael Tokarev wrote: Avi Kivity wrote: Early implementations of virtio devices did not support barrier operations, but did commit the data to disk. In such cases, drain the queue to emulate barrier operations. Are there any implementation

Re: sync guest calls made async on host - SQLite performance

2009-10-14 Thread Christoph Hellwig
On Thu, Oct 15, 2009 at 01:56:40AM +0900, Avi Kivity wrote: Does virtio say it has a write cache or not (and how does one say it?)? Historically it didn't and the only safe way to use virtio was in cache=writethrough mode. Since qemu git as of 4th Sempember and Linux 2.6.32-rc there is a

Re: sync guest calls made async on host - SQLite performance

2009-10-15 Thread Christoph Hellwig
On Wed, Oct 14, 2009 at 05:54:23PM -0500, Anthony Liguori wrote: Historically it didn't and the only safe way to use virtio was in cache=writethrough mode. Which should be the default on Ubuntu's kvm that this report is concerned with so I'm a bit confused. So can we please get the

Re: sync guest calls made async on host - SQLite performance

2009-10-15 Thread Christoph Hellwig
On Thu, Oct 15, 2009 at 02:17:02PM +0200, Christoph Hellwig wrote: On Wed, Oct 14, 2009 at 05:54:23PM -0500, Anthony Liguori wrote: Historically it didn't and the only safe way to use virtio was in cache=writethrough mode. Which should be the default on Ubuntu's kvm that this report

Re: [Qemu-devel] [PATCH 0/4] megaraid_sas HBA emulation

2009-10-28 Thread Christoph Hellwig
On Wed, Oct 28, 2009 at 09:11:29AM +0100, Hannes Reinecke wrote: The problem is I don't have any documentation for the LSI parallel SCSI controller. So I don't know if and in what shape I/O is passed down, nor anything else. And as the SCSI disk emulation is really tied into the LSI parallel

Re: [Qemu-devel] [PATCH 3/4] scsi-disk: Factor out SCSI command emulation

2009-10-28 Thread Christoph Hellwig
On Tue, Oct 27, 2009 at 04:28:59PM +0100, Hannes Reinecke wrote: Other drives might want to use SCSI command emulation without going through the SCSI disk abstraction, as this imposes too many limits on the emulation. Might be a good idea to move something like this first into the series and

Re: [Qemu-devel] [PATCH 0/4] megaraid_sas HBA emulation

2009-10-28 Thread Christoph Hellwig
On Wed, Oct 28, 2009 at 08:25:22PM +0100, Hannes Reinecke wrote: I don't think we really need two modes. My preferred interface here is to pass down scatter-gather lists down with every xfer; this way it'll be the responsibility of the driver to create the lists in the first place. If it has

Re: [Qemu-devel] [PATCH 0/4] megaraid_sas HBA emulation

2009-10-29 Thread Christoph Hellwig
On Thu, Oct 29, 2009 at 01:57:40PM +0100, Gerd Hoffmann wrote: Trying to go forward in review+bisect friendly baby steps. Here is what I have now: http://repo.or.cz/w/qemu/kraxel.git?a=shortlog;h=refs/heads/scsi.v1 It is far from being completed, will continue tomorrow. Should give a

Re: [Qemu-devel] [PATCH 0/4] megaraid_sas HBA emulation

2009-10-29 Thread Christoph Hellwig
On Thu, Oct 29, 2009 at 10:14:19AM -0500, Anthony Liguori wrote: Which patches are those? http://repo.or.cz/w/qemu/kraxel.git?a=commitdiff;h=1ee5ee08e4427c3db7e1322d30cc0e58e5ca48b9 and http://repo.or.cz/w/qemu/kraxel.git?a=commitdiff;h=a6e6178185786c582141f993272e00521d3f125a -- To

Re: [patch 4/4] KVM-trace port to tracepoints

2008-07-23 Thread Christoph Hellwig
On Wed, Jul 23, 2008 at 01:08:41PM +0300, Avi Kivity wrote: trace_mark() is implement kvmtrace, which is propagated to userspace. So while trace_mark() itself is not a userspace interface, one of its users is. It's an unstable interface. But so is dmesg; that's the nature of tracing.

Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool

2008-12-11 Thread Christoph Hellwig
On Thu, Dec 11, 2008 at 05:49:47PM +0100, Andrea Arcangeli wrote: On Thu, Dec 11, 2008 at 05:11:08PM +0100, Gerd Hoffmann wrote: Yes. But kernel aio requires O_DIRECT, so aio users are affected nevertheless. Are you sure? It surely wasn't the case... Mainline kernel aio only implements

Re: KVM call minutes for Feb 1

2011-02-01 Thread Christoph Hellwig
On Tue, Feb 01, 2011 at 05:36:13PM +0100, Jan Kiszka wrote: kvm_cpu_exec/kvm_run, and start wondering What needs to be done to upstream so that qemu-kvm could use that implementation?. If they differ, the reasons need to be understood and patched away, either by fixing/enhancing upstream or

Re: [PATCH] kvm tools: Use mmap for working with disk image V2

2011-04-11 Thread Christoph Hellwig
How do you plan to handle I/O errors or ENOSPC conditions? Note that shared writeable mappings are by far the feature in the VM/FS code that is most error prone, including the impossiblity of doing sensible error handling. The version that accidentally used MAP_PRIVATE actually makes a lot of

Re: [Qemu-devel] [qemu-iotests][PATCH] Update rbd support

2011-04-12 Thread Christoph Hellwig
@@ -43,6 +43,10 @@ _supported_fmt raw _supported_proto generic _supported_os Linux +# rbd images are not growable +if [ $IMGPROTO = rbd ]; then +_notrun image protocol $IMGPROTO does not support growable images +fi I suspect we only support the weird writing past size for the file

Re: [PATCH] kvm tool: add QCOW verions 1 read/write support

2011-04-14 Thread Christoph Hellwig
On Wed, Apr 13, 2011 at 08:01:58PM +0100, Prasad Joshi wrote: The patch only implements the basic read write support for QCOW version 1 images. Many of the QCOW features are not implmented, for example What's the point? Qcow1 has been deprecated for a long time. -- To unsubscribe from this

Re: [PATCH 3/5] [block]: Add paio_submit_len() non sector sized AIO

2010-06-14 Thread Christoph Hellwig
On Mon, Jun 14, 2010 at 02:44:31AM -0700, Nicholas A. Bellinger wrote: From: Nicholas Bellinger n...@linux-iscsi.org This patch adds posix-aio-compat.c:paio_submit_len(), which is a identical to paio_submit() expect that in expected nb_len instead of nb_sectors (* 512) so that it can be used

[PATCH] virtio_blk: support barriers without FLUSH feature

2010-06-15 Thread Christoph Hellwig
imply working barriers on old qemu versions or other hypervisors that actually have a volatile write cache this is only a cosmetic issue - these hypervisors don't guarantee any data integrity with or without this patch, but with the patch we at least provide data ordering. Signed-off-by: Christoph

Re: KVM call minutes for June 15

2010-06-15 Thread Christoph Hellwig
On Tue, Jun 15, 2010 at 08:18:12AM -0700, Chris Wright wrote: KVM/qemu patches - patch rate is high, documentation is low, review is low - patches need to include better descriptions and documentation - will slow down patch writers - will make it easier for patch reviewers What is the

Re: [Qemu-devel] [PATCH 1/2] Add 'serial' attribute to virtio-blk devices

2010-06-21 Thread Christoph Hellwig
On Fri, Jun 18, 2010 at 01:38:02PM -0500, Ryan Harper wrote: Create a new attribute for virtio-blk devices that will fetch the serial number of the block device. This attribute can be used by udev to create disk/by-id symlinks for devices that don't have a UUID (filesystem) associated with

Re: Graphical virtualisation management system

2010-06-25 Thread Christoph Hellwig
On Thu, Jun 24, 2010 at 02:01:52PM -0500, Javier Guerra Giraldez wrote: On Thu, Jun 24, 2010 at 1:32 PM, Freddie Cash fjwc...@gmail.com wrote: ??* virt-manager which requires X and seems to be more desktop-oriented; don't know about the others, but virt-manager runs only on the admin

Re: JFYI: ext4 bug triggerable by kvm

2010-08-16 Thread Christoph Hellwig
On Mon, Aug 16, 2010 at 09:43:09AM -0500, Anthony Liguori wrote: Also, ext4 is _very_ slow on O_SYNC writes (which is used in kvm with default cache). Yeah, we probably need to switch to sync_file_range() to avoid the journal commit on every write. No, we don't. sync_file_range does not

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Mon, Aug 16, 2010 at 03:34:12PM -0500, Anthony Liguori wrote: On 08/16/2010 01:42 PM, Christoph Hellwig wrote: On Mon, Aug 16, 2010 at 09:43:09AM -0500, Anthony Liguori wrote: Also, ext4 is _very_ slow on O_SYNC writes (which is used in kvm with default cache). Yeah, we probably need

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 12:23:01PM +0300, Avi Kivity wrote: On 08/17/2010 12:07 PM, Christoph Hellwig wrote: In short it's completely worthless for any real filesystem. The documentation should be updated then. It suggests that it is usable for data integrity. The manpage has

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 07:56:04AM -0500, Anthony Liguori wrote: But assuming that you had a preallocated disk image, it would effectively flush the page cache so it sounds like the only real issue is sparse and growable files. For preallocated as in using fallocate() we still converting

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 09:20:37AM -0500, Anthony Liguori wrote: On 08/17/2010 08:07 AM, Christoph Hellwig wrote: The point is that we don't want to flush the disk write cache. The intention of writethrough is not to make the disk cache writethrough but to treat the host's cache

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 09:39:15AM -0500, Anthony Liguori wrote: The type of cache we present to the guest only should relate to how the hypervisor caches the storage. It should be independent of how data is cached by the disk. It is. There can be many levels of caching in a storage

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 09:44:49AM -0500, Anthony Liguori wrote: I think the real issue is we're mixing host configuration with guest visible state. The last time I proposed to decouple the two you and Avi were heavily opposed to it.. With O_SYNC, we're causing cache=writethrough to do

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 09:54:07AM -0500, Anthony Liguori wrote: This is simply unrealistic. O_SYNC might force data to be on a platter when using a directly attached disk but many NAS's actually do writeback caching and relying on having an UPS to preserve data integrity. There's really no

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 05:59:07PM +0300, Avi Kivity wrote: I agree, but there's another case: tell the guest that we have a write cache, use O_DSYNC, but only flush the disk cache on guest flushes. O_DSYNC flushes the disk write cache and any filesystem that supports non-volatile cache. The

Re: [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size

2010-01-04 Thread Christoph Hellwig
On Mon, Jan 04, 2010 at 01:38:51PM +1030, Rusty Russell wrote: I thought this was what I was doing, but I have shown over and over that I have no idea about block devices. Our current driver treats BLK_SIZE as the logical and physical size (see blk_queue_logical_block_size). I have no

Re: [Qemu-devel] [PATCH RFC] Advertise IDE physical block size as 4K

2010-01-04 Thread Christoph Hellwig
On Tue, Dec 29, 2009 at 02:39:38PM +0100, Luca Tettamanti wrote: Linux tools put the first partition at sector 63 (512-byte) to retain compatibility with Windows; Well, some of them, and depending on the exact disks. It's all rather complicated. It has been discussed for hardware disk

Re: [Qemu-devel] [PATCH RFC] Advertise IDE physical block size as 4K

2010-01-04 Thread Christoph Hellwig
On Tue, Dec 29, 2009 at 12:07:58PM +0200, Avi Kivity wrote: Guests use this number as a hint for alignment and I/O request sizes. Given that modern disks have 4K block sizes, and cached file-backed images also have 4K block sizes, this hint can improve guest performance. We probably need to

Re: [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size

2010-01-08 Thread Christoph Hellwig
On Tue, Jan 05, 2010 at 08:16:15PM +, Jamie Lokier wrote: It would be good if virtio relayed the backing device's basic topology hints, so: - If the backing dev is a real disk with 512-byte sectors, virtio should indicate 512-byte blocks to the guest. - If the backing

Re: [PATCHv2 3/4] Add HYPER-V apic access MSRs.

2010-01-17 Thread Christoph Hellwig
On Sun, Jan 17, 2010 at 02:20:32PM +0200, Avi Kivity wrote: +addr = kmap_atomic(page, KM_USER0); +clear_user_page(addr, vaddr, page); +kunmap_atomic(addr, KM_USER0); Surprising that clear_user_page needs kmap_atomic() (but true). There's a

Re: [PATCHv2 3/4] Add HYPER-V apic access MSRs.

2010-01-17 Thread Christoph Hellwig
On Sun, Jan 17, 2010 at 02:41:42PM +0200, Gleb Natapov wrote: I copied code from the instead of using helper faction for some unknown to me reason. Anyway if I can't get struct page from user virtual address I can't use it. Actually I am not sure the page should be zeroed at all. Spec only

Re: [PATCH v3 06/12] Add get_user_pages() variant that fails if major fault is required.

2010-01-17 Thread Christoph Hellwig
On Tue, Jan 05, 2010 at 04:12:48PM +0200, Gleb Natapov wrote: This patch add get_user_pages() variant that only succeeds if getting a reference to a page doesn't require major fault. +int get_user_pages_noio(struct task_struct *tsk, struct mm_struct *mm, + unsigned long start,

Re: [PATCH v3 05/12] Export __get_user_pages_fast.

2010-01-17 Thread Christoph Hellwig
On Tue, Jan 05, 2010 at 04:12:47PM +0200, Gleb Natapov wrote: KVM will use it to try and find a page without falling back to slow gup. That is why get_user_pages_fast() is not enough. Btw, it seems like currently is declared unconditionally in linux/mm.h but only implemented by x86, and you

Re: Seabios incompatible with Linux 2.6.26 host?

2010-02-05 Thread Christoph Hellwig
On Thu, Feb 04, 2010 at 03:34:24PM +0100, Pierre Riteau wrote: I think I traced back the issue to the switch from Bochs BIOS to Seabios. By forcing the usage of Bochs BIOS 5f08bb45861f54be478b25075b90d2406a0f8bb3 works, while it dies without the -bios override. Unfortunately, newer versions

Re: raw disks no longer work in latest kvm (kvm-88 was fine)

2010-03-07 Thread Christoph Hellwig
where the problem is. Actually it is, and the bug has been fixed long ago in: commit e2a305fb13ff0f5cf6ff80aaa90a5ed5954c Author: Christoph Hellwig h...@lst.de Date: Tue Jan 26 14:49:08 2010 +0100 block: avoid creating too large iovecs in multiwrite_merge I've asked for it be added

Re: raw disks no longer work in latest kvm (kvm-88 was fine)

2010-03-07 Thread Christoph Hellwig
On Sun, Mar 07, 2010 at 07:30:06PM +0200, Avi Kivity wrote: It may also be that glibc is emulating preadv, incorrectly. I've done a quick audit of all pathes leading to pread and all seem to align correctly. So either a broken glibc emulation or something else outside the block layer seems

Re: linux-aio usable?

2010-03-08 Thread Christoph Hellwig
On Mon, Mar 08, 2010 at 11:10:29AM +0200, Avi Kivity wrote: Are there any potential pitfalls? It won't work well unless running on a block device (partition or LVM). It will actually work well on pre-allocated filesystem images, at least on XFS and NFS. The real pitfal is that cache=none

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Christoph Hellwig
On Mon, Mar 15, 2010 at 06:43:06PM -0500, Anthony Liguori wrote: I knew someone would do this... This really gets down to your definition of safe behaviour. As it stands, if you suffer a power outage, it may lead to guest corruption. While we are correct in advertising a write-cache,

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-16 Thread Christoph Hellwig
On Mon, Mar 15, 2010 at 08:27:25PM -0500, Anthony Liguori wrote: Actually cache=writeback is as safe as any normal host is with a volatile disk cache, except that in this case the disk cache is actually a lot larger. With a properly implemented filesystem this will never cause corruption.

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-16 Thread Christoph Hellwig
Avi, cache=writeback can be faster than cache=none for the same reasons a disk cache speeds up access. As long as the I/O mix contains more asynchronous then synchronous writes it allows the host to do much more reordering, only limited by the cache size (which can be quite huge when using the

Re: KVM call agenda for Mar 16

2010-03-16 Thread Christoph Hellwig
On Tue, Mar 16, 2010 at 12:38:02PM +0200, Avi Kivity wrote: On 03/16/2010 12:31 PM, Daniel P. Berrange wrote: Polling loops are an indication that something is wrong. Except when people suggest they are the right answer, qcow high watermark ;-P I liked Anthony's suggestion of an

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-16 Thread Christoph Hellwig
On Tue, Mar 16, 2010 at 12:36:31PM +0200, Avi Kivity wrote: Are you talking about direct volume access or qcow2? Doesn't matter. For direct volume access, I still don't get it. The number of barriers issues by the host must equal (or exceed, but that's pointless) the number of barriers

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Christoph Hellwig
On Tue, Mar 16, 2010 at 01:08:28PM +0200, Avi Kivity wrote: If the batch size is larger than the virtio queue size, or if there are no flushes at all, then yes the huge write cache gives more opportunity for reordering. But we're already talking hundreds of requests here. Yes. And

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Christoph Hellwig
On Wed, Mar 17, 2010 at 06:22:29PM +0200, Avi Kivity wrote: They should be reorderable. Otherwise host filesystems on several volumes would suffer the same problems. They are reordable, just not as extremly as the the page cache. Remember that the request queue really is just a relatively

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Christoph Hellwig
On Wed, Mar 17, 2010 at 06:40:30PM +0200, Avi Kivity wrote: Chris, can you carry out an experiment? Write a program that pwrite()s a byte to a file at the same location repeatedly, with the file opened using O_SYNC. Measure the write rate, and run blktrace on the host to see what the disk

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Christoph Hellwig
On Wed, Mar 17, 2010 at 06:53:34PM +0200, Avi Kivity wrote: Meanwhile I looked at the code, and it looks bad. There is an IO_CMD_FDSYNC, but it isn't tagged, so we have to drain the queue before issuing it. In any case, qemu doesn't use it as far as I could tell, and even if it did,

Re: [PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify

2010-11-11 Thread Christoph Hellwig
On Thu, Nov 11, 2010 at 01:47:21PM +, Stefan Hajnoczi wrote: Some virtio devices are known to have guest drivers which expect a notify to be processed synchronously and spin waiting for completion. Only enable ioeventfd for virtio-blk and virtio-net for now. Who guarantees that less

Re: virtio block drivers not working

2009-03-22 Thread Christoph Hellwig
I do you virtio block in a very similar setup to yours (fully static kernel, -kernel option to kvm/qemu) sucesfully for quite a a while. Can you post your kernel .config and the contents of /proc/devices and /proc/partitions to debug this further? -- To unsubscribe from this list: send the line

Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT

2009-03-23 Thread Christoph Hellwig
On Mon, Mar 23, 2009 at 06:17:36PM +0200, Avi Kivity wrote: Instead of introducing yet another layer of indirection, you could add block-raw-linux-aio, which would be registered before block-raw-posix (which is realy block-raw-threadpool...), and resist a -probe() if caching is enabled.

Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT

2009-03-23 Thread Christoph Hellwig
On Mon, Mar 23, 2009 at 12:14:58PM -0500, Anthony Liguori wrote: I'd like to see the O_DIRECT bounce buffering removed in favor of the DMA API bouncing. Once that happens, raw_read and raw_pread can disappear. block-raw-posix becomes much simpler. See my vectored I/O patches for doing

Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT

2009-03-23 Thread Christoph Hellwig
On Mon, Mar 23, 2009 at 12:14:58PM -0500, Anthony Liguori wrote: block-raw-posix needs a major overhaul. That's why I'm not even considering committing the patch as is. I have some WIP patches that split out the host device bits into separate files to get block-raw-posix down to the pure

Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT

2009-03-23 Thread Christoph Hellwig
On Mon, Mar 23, 2009 at 01:10:30PM -0500, Anthony Liguori wrote: I really dislike having so many APIs. I'd rather have an aio API that took byte accesses or have pread/pwrite always be emulated with a full sector read/write I had patches to change the aio API to byte based access, and get

Re: Split kvm source tarballs

2009-03-25 Thread Christoph Hellwig
On Wed, Mar 25, 2009 at 08:44:58AM -0500, Anthony Liguori wrote: That's what I figured. FWIW, the split tarballs work just fine for me. It may be worth waiting to do step 2 until the IO thread is merged. I think once that happens, we could probably do a sprint to get rid of libkvm in

Re: Split kvm source tarballs

2009-03-25 Thread Christoph Hellwig
On Wed, Mar 25, 2009 at 08:02:48PM +0200, Avi Kivity wrote: So how about this: - keep copies of the headers in the qemu repository. 'make sync' becomes a maintainer tool rather than a developer tool Yeah. That similar how we maintain the headers and some shared source file for XFS and

Re: Split kvm source tarballs

2009-03-26 Thread Christoph Hellwig
On Wed, Mar 25, 2009 at 04:34:31PM -0500, Anthony Liguori wrote: But if you created a qemu-svn-stable branch that followed the QEMU stable tree in kvm-userspace, like the qemu-cvs branch follows trunk, then it would be pretty easy to create and maintain a kvm_stable_0_10 branch of

Re: Split kvm source tarballs

2009-03-26 Thread Christoph Hellwig
On Thu, Mar 26, 2009 at 11:09:07AM +0200, Avi Kivity wrote: If the repo contains only the kit (external-module.h and the hack scripts) we'll be left with dual repositories with their confusion and unbisectability. If the repo contains both the kit and the code, I'll need to commit every

Re: Split kvm source tarballs

2009-03-26 Thread Christoph Hellwig
On Thu, Mar 26, 2009 at 01:19:46PM -0500, Anthony Liguori wrote: Slightly offtopic, but I always wondered why qemu is hosted in svn. For all project having semi-forks of qemu that they try to keep in sync or even merge back a distributed scm would work so much better. I'm going to switch

Re: Split kvm source tarballs

2009-04-16 Thread Christoph Hellwig
Any idea when the split will happen? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH] qemu-kvm: build system Add link to qemu

2009-04-26 Thread Christoph Hellwig
On Sun, Apr 26, 2009 at 01:33:37PM +0300, Avi Kivity wrote: Jan Kiszka wrote: I'm getting closer to a working qemu-kvm, but there are still a few messy parts. The magic dance goes like this: Try a fresh fetch. ./configure make ought to work. Works for me now. -- To unsubscribe from

[PATCH] virtio_blk: SG_IO passthru support

2009-04-27 Thread Christoph Hellwig
, some comments and tested the whole beast] Signed-off-by: Hannes Reinecke h...@suse.de Signed-off-by: Christoph Hellwig h...@lst.de Index: linux-2.6/drivers/block/virtio_blk.c === --- linux-2.6.orig/drivers/block/virtio_blk.c 2009

[PATCH] virtio-blk: add SGI_IO passthru support

2009-04-27 Thread Christoph Hellwig
to async I/O, but that would required either bloating the VirtIOBlockReq by the size of struct sg_io_hdr or an additional memory allocation for each SG_IO request. Signed-off-by: Christoph Hellwig h...@lst.de Index: qemu/hw/virtio-blk.h

Re: [ANNOUNCE] kvm-85 release

2009-04-27 Thread Christoph Hellwig
On Mon, Apr 27, 2009 at 04:45:23PM +0200, Alexander Graf wrote: - Disabled pwritev() support until glibc stops writing random junk. See https://bugzilla.redhat.com/497429 Wouldn't it be useful to have it disabled upstream then? No glibc has been released with the broken one yet, I gusee

Re: [Qemu-devel] [PATCH] virtio-blk: add SGI_IO passthru support

2009-04-28 Thread Christoph Hellwig
On Mon, Apr 27, 2009 at 12:15:31PM +0300, Avi Kivity wrote: I think that's worthwhile. The extra bloat is trivial (especially as the number of inflight virtio requests is tightly bounded), and stalling the vcpu for requests is a pain. Ok, new patch will follow ASAP. -- To unsubscribe from

[PATCH 2/2] virtio-blk: add SG_IO passthru support

2009-04-28 Thread Christoph Hellwig
Add support for SG_IO passthru (packet commands) to the virtio-blk backend. Conceptually based on an older patch from Hannes Reinecke but largely rewritten to match the code structure and layering in virtio-blk aswell as doing asynchronous I/O. Signed-off-by: Christoph Hellwig h...@lst.de

  1   2   3   >