Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Mon, Aug 16, 2010 at 03:34:12PM -0500, Anthony Liguori wrote: On 08/16/2010 01:42 PM, Christoph Hellwig wrote: On Mon, Aug 16, 2010 at 09:43:09AM -0500, Anthony Liguori wrote: Also, ext4 is _very_ slow on O_SYNC writes (which is used in kvm with default cache). Yeah, we probably need to

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Avi Kivity
On 08/17/2010 12:07 PM, Christoph Hellwig wrote: In short it's completely worthless for any real filesystem. The documentation should be updated then. It suggests that it is usable for data integrity. (or maybe, it should be fixed?) -- error compiling committee.c: too many arguments to

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 12:23:01PM +0300, Avi Kivity wrote: On 08/17/2010 12:07 PM, Christoph Hellwig wrote: In short it's completely worthless for any real filesystem. The documentation should be updated then. It suggests that it is usable for data integrity. The manpage has a

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Anthony Liguori
On 08/17/2010 04:07 AM, Christoph Hellwig wrote: On Mon, Aug 16, 2010 at 03:34:12PM -0500, Anthony Liguori wrote: On 08/16/2010 01:42 PM, Christoph Hellwig wrote: On Mon, Aug 16, 2010 at 09:43:09AM -0500, Anthony Liguori wrote: Also, ext4 is _very_ slow on O_SYNC writes

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 07:56:04AM -0500, Anthony Liguori wrote: But assuming that you had a preallocated disk image, it would effectively flush the page cache so it sounds like the only real issue is sparse and growable files. For preallocated as in using fallocate() we still converting

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Anthony Liguori
On 08/17/2010 08:07 AM, Christoph Hellwig wrote: The point is that we don't want to flush the disk write cache. The intention of writethrough is not to make the disk cache writethrough but to treat the host's cache as writethrough. We need to make sure data is not in the disk write

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 09:20:37AM -0500, Anthony Liguori wrote: On 08/17/2010 08:07 AM, Christoph Hellwig wrote: The point is that we don't want to flush the disk write cache. The intention of writethrough is not to make the disk cache writethrough but to treat the host's cache as

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Anthony Liguori
On 08/17/2010 09:28 AM, Christoph Hellwig wrote: On Tue, Aug 17, 2010 at 09:20:37AM -0500, Anthony Liguori wrote: On 08/17/2010 08:07 AM, Christoph Hellwig wrote: The point is that we don't want to flush the disk write cache. The intention of writethrough is not to make the disk

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Michael Tokarev
17.08.2010 18:28, Christoph Hellwig wrote: On Tue, Aug 17, 2010 at 09:20:37AM -0500, Anthony Liguori wrote: [] For normal writes from a guest, we don't need to follow the write with an fsync(). We should only need to issue an fsync() given an explicit flush from the guest. Define normal

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Anthony Liguori
On 08/17/2010 09:40 AM, Michael Tokarev wrote: fsync() being slow is orthogonal to my point. I don't see why we need to do an fsync() on *every* write. It should only be necessary when a guest injects an actual barrier. We don't do sync on every write, but O_SYNC implies that. And

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 09:39:15AM -0500, Anthony Liguori wrote: The type of cache we present to the guest only should relate to how the hypervisor caches the storage. It should be independent of how data is cached by the disk. It is. There can be many levels of caching in a storage

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 09:44:49AM -0500, Anthony Liguori wrote: I think the real issue is we're mixing host configuration with guest visible state. The last time I proposed to decouple the two you and Avi were heavily opposed to it.. With O_SYNC, we're causing cache=writethrough to do

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Avi Kivity
On 08/17/2010 05:45 PM, Christoph Hellwig wrote: The users doesn't know or have to care about the caching. The users uses O_SYNC/fsync to tell it wants data on disk, and it's the operating systems job to make that happen. The situation with qemu is the same - if we tell the guest that we do

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Anthony Liguori
On 08/17/2010 09:45 AM, Christoph Hellwig wrote: On Tue, Aug 17, 2010 at 09:39:15AM -0500, Anthony Liguori wrote: The type of cache we present to the guest only should relate to how the hypervisor caches the storage. It should be independent of how data is cached by the disk. It is.

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Anthony Liguori
On 08/17/2010 09:46 AM, Christoph Hellwig wrote: On Tue, Aug 17, 2010 at 09:44:49AM -0500, Anthony Liguori wrote: I think the real issue is we're mixing host configuration with guest visible state. The last time I proposed to decouple the two you and Avi were heavily opposed to it..

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Avi Kivity
On 08/17/2010 05:46 PM, Christoph Hellwig wrote: On Tue, Aug 17, 2010 at 09:44:49AM -0500, Anthony Liguori wrote: I think the real issue is we're mixing host configuration with guest visible state. The last time I proposed to decouple the two you and Avi were heavily opposed to it.. I

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Avi Kivity
On 08/17/2010 05:54 PM, Anthony Liguori wrote: This is simply unrealistic. O_SYNC might force data to be on a platter when using a directly attached disk but many NAS's actually do writeback caching and relying on having an UPS to preserve data integrity. There's really no way in the

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 09:54:07AM -0500, Anthony Liguori wrote: This is simply unrealistic. O_SYNC might force data to be on a platter when using a directly attached disk but many NAS's actually do writeback caching and relying on having an UPS to preserve data integrity. There's really no

Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 05:59:07PM +0300, Avi Kivity wrote: I agree, but there's another case: tell the guest that we have a write cache, use O_DSYNC, but only flush the disk cache on guest flushes. O_DSYNC flushes the disk write cache and any filesystem that supports non-volatile cache. The

JFYI: ext4 bug triggerable by kvm

2010-08-16 Thread Michael Tokarev
https://bugzilla.kernel.org/show_bug.cgi?id=16165 When a (raw) guest image is placed on an ext4 filesystem, it is possible to get data corruption, now due to ext4 bug, not kvm bug. Also, ext4 is _very_ slow on O_SYNC writes (which is used in kvm with default cache). JFYI. /mjt -- To

Re: JFYI: ext4 bug triggerable by kvm

2010-08-16 Thread Anthony Liguori
On 08/16/2010 09:00 AM, Michael Tokarev wrote: https://bugzilla.kernel.org/show_bug.cgi?id=16165 When a (raw) guest image is placed on an ext4 filesystem, it is possible to get data corruption, now due to ext4 bug, not kvm bug. Yeah, there appears to be a few O_DIRECT related issues with

Re: JFYI: ext4 bug triggerable by kvm

2010-08-16 Thread Christoph Hellwig
On Mon, Aug 16, 2010 at 09:43:09AM -0500, Anthony Liguori wrote: Also, ext4 is _very_ slow on O_SYNC writes (which is used in kvm with default cache). Yeah, we probably need to switch to sync_file_range() to avoid the journal commit on every write. No, we don't. sync_file_range does not

Re: JFYI: ext4 bug triggerable by kvm

2010-08-16 Thread Anthony Liguori
On 08/16/2010 01:42 PM, Christoph Hellwig wrote: On Mon, Aug 16, 2010 at 09:43:09AM -0500, Anthony Liguori wrote: Also, ext4 is _very_ slow on O_SYNC writes (which is used in kvm with default cache). Yeah, we probably need to switch to sync_file_range() to avoid the journal commit