Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-22 Thread Chris Webb
Chris Webb ch...@arachsys.com writes: Okay. What I was driving at in describing these systems as 'already broken' is that they will already lose data (in this sense) if they're run on bare metal with normal commodity SATA disks with their 32MB write caches on. That configuration surely

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-22 Thread Avi Kivity
On 03/22/2010 11:04 PM, Chris Webb wrote: Chris Webbch...@arachsys.com writes: Okay. What I was driving at in describing these systems as 'already broken' is that they will already lose data (in this sense) if they're run on bare metal with normal commodity SATA disks with their 32MB

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-22 Thread Chris Webb
Avi Kivity a...@redhat.com writes: On 03/22/2010 11:04 PM, Chris Webb wrote: Unless I'm missing something, the risk to guest OSes in this configuration should therefore be exactly the same as the risk from running on normal commodity hardware with such drives and no expensive battery-backed

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-19 Thread Dave Hansen
On Tue, 2010-03-16 at 11:05 +0200, Avi Kivity wrote: Not really. In many cloud environments, there's a set of common images that are instantiated on each node. Usually this is because you're running a horizontally scalable application or because you're supporting an ephemeral storage

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Christoph Hellwig
On Tue, Mar 16, 2010 at 01:08:28PM +0200, Avi Kivity wrote: If the batch size is larger than the virtio queue size, or if there are no flushes at all, then yes the huge write cache gives more opportunity for reordering. But we're already talking hundreds of requests here. Yes. And

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity
On 03/17/2010 10:49 AM, Christoph Hellwig wrote: On Tue, Mar 16, 2010 at 01:08:28PM +0200, Avi Kivity wrote: If the batch size is larger than the virtio queue size, or if there are no flushes at all, then yes the huge write cache gives more opportunity for reordering. But we're already

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Chris Webb
Anthony Liguori anth...@codemonkey.ws writes: This really gets down to your definition of safe behaviour. As it stands, if you suffer a power outage, it may lead to guest corruption. While we are correct in advertising a write-cache, write-caches are volatile and should a drive lose

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Chris Webb
Avi Kivity a...@redhat.com writes: On 03/15/2010 10:23 PM, Chris Webb wrote: Wasteful duplication of page cache between guest and host notwithstanding, turning on cache=writeback is a spectacular performance win for our guests. Is this with qcow2, raw file, or direct volume access? This

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Anthony Liguori
On 03/17/2010 10:14 AM, Chris Webb wrote: Anthony Liguorianth...@codemonkey.ws writes: This really gets down to your definition of safe behaviour. As it stands, if you suffer a power outage, it may lead to guest corruption. While we are correct in advertising a write-cache, write-caches

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity
On 03/17/2010 05:24 PM, Chris Webb wrote: Avi Kivitya...@redhat.com writes: On 03/15/2010 10:23 PM, Chris Webb wrote: Wasteful duplication of page cache between guest and host notwithstanding, turning on cache=writeback is a spectacular performance win for our guests. Is

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Balbir Singh
* Anthony Liguori anth...@codemonkey.ws [2010-03-17 10:55:47]: On 03/17/2010 10:14 AM, Chris Webb wrote: Anthony Liguorianth...@codemonkey.ws writes: This really gets down to your definition of safe behaviour. As it stands, if you suffer a power outage, it may lead to guest corruption.

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Chris Webb
Anthony Liguori anth...@codemonkey.ws writes: On 03/17/2010 10:14 AM, Chris Webb wrote: (c) installations that are already broken and lose data with a physical drive with a write-cache can lose much more in this case because the write cache is much bigger? This is the

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity
On 03/17/2010 06:22 PM, Avi Kivity wrote: Also, if my guest kernel issues (say) three small writes, one at the start of the disk, one in the middle, one at the end, and then does a flush, can virtio really express this as one non-contiguous O_DIRECT write (the three components of which can be

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Chris Webb
Avi Kivity a...@redhat.com writes: Chris, can you carry out an experiment? Write a program that pwrite()s a byte to a file at the same location repeatedly, with the file opened using O_SYNC. Measure the write rate, and run blktrace on the host to see what the disk (/dev/sda, not the volume)

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Christoph Hellwig
On Wed, Mar 17, 2010 at 06:22:29PM +0200, Avi Kivity wrote: They should be reorderable. Otherwise host filesystems on several volumes would suffer the same problems. They are reordable, just not as extremly as the the page cache. Remember that the request queue really is just a relatively

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity
On 03/17/2010 06:47 PM, Chris Webb wrote: Avi Kivitya...@redhat.com writes: Chris, can you carry out an experiment? Write a program that pwrite()s a byte to a file at the same location repeatedly, with the file opened using O_SYNC. Measure the write rate, and run blktrace on the host to

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Christoph Hellwig
On Wed, Mar 17, 2010 at 06:40:30PM +0200, Avi Kivity wrote: Chris, can you carry out an experiment? Write a program that pwrite()s a byte to a file at the same location repeatedly, with the file opened using O_SYNC. Measure the write rate, and run blktrace on the host to see what the disk

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Christoph Hellwig
On Wed, Mar 17, 2010 at 06:53:34PM +0200, Avi Kivity wrote: Meanwhile I looked at the code, and it looks bad. There is an IO_CMD_FDSYNC, but it isn't tagged, so we have to drain the queue before issuing it. In any case, qemu doesn't use it as far as I could tell, and even if it did,

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity
On 03/17/2010 06:58 PM, Christoph Hellwig wrote: On Wed, Mar 17, 2010 at 06:53:34PM +0200, Avi Kivity wrote: Meanwhile I looked at the code, and it looks bad. There is an IO_CMD_FDSYNC, but it isn't tagged, so we have to drain the queue before issuing it. In any case, qemu doesn't use it

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity
On 03/17/2010 06:52 PM, Christoph Hellwig wrote: On Wed, Mar 17, 2010 at 06:22:29PM +0200, Avi Kivity wrote: They should be reorderable. Otherwise host filesystems on several volumes would suffer the same problems. They are reordable, just not as extremly as the the page cache.

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity
On 03/17/2010 06:57 PM, Christoph Hellwig wrote: On Wed, Mar 17, 2010 at 06:40:30PM +0200, Avi Kivity wrote: Chris, can you carry out an experiment? Write a program that pwrite()s a byte to a file at the same location repeatedly, with the file opened using O_SYNC. Measure the write rate,

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Vivek Goyal
On Wed, Mar 17, 2010 at 03:14:10PM +, Chris Webb wrote: Anthony Liguori anth...@codemonkey.ws writes: This really gets down to your definition of safe behaviour. As it stands, if you suffer a power outage, it may lead to guest corruption. While we are correct in advertising a

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Chris Webb
Vivek Goyal vgo...@redhat.com writes: Are you using CFQ in the host? What is the host kernel version? I am not sure what is the problem here but you might want to play with IO controller and put these guests in individual cgroups and see if you get better throughput even with

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-16 Thread Christoph Hellwig
On Mon, Mar 15, 2010 at 08:27:25PM -0500, Anthony Liguori wrote: Actually cache=writeback is as safe as any normal host is with a volatile disk cache, except that in this case the disk cache is actually a lot larger. With a properly implemented filesystem this will never cause corruption.

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-16 Thread Avi Kivity
On 03/15/2010 08:48 PM, Anthony Liguori wrote: On 03/15/2010 04:27 AM, Avi Kivity wrote: That's only beneficial if the cache is shared. Otherwise, you could use the balloon to evict cache when memory is tight. Shared cache is mostly a desktop thing where users run similar workloads. For

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-16 Thread Avi Kivity
On 03/15/2010 10:23 PM, Chris Webb wrote: Avi Kivitya...@redhat.com writes: On 03/15/2010 10:07 AM, Balbir Singh wrote: Yes, it is a virtio call away, but is the cost of paying twice in terms of memory acceptable? Usually, it isn't, which is why I recommend cache=off.

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-16 Thread Kevin Wolf
Am 16.03.2010 10:17, schrieb Avi Kivity: On 03/15/2010 10:23 PM, Chris Webb wrote: Avi Kivitya...@redhat.com writes: On 03/15/2010 10:07 AM, Balbir Singh wrote: Yes, it is a virtio call away, but is the cost of paying twice in terms of memory acceptable? Usually, it

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-16 Thread Avi Kivity
On 03/16/2010 11:54 AM, Kevin Wolf wrote: Is this with qcow2, raw file, or direct volume access? I can understand it for qcow2, but for direct volume access this shouldn't happen. The guest schedules as many writes as it can, followed by a sync. The host (and disk) can then reschedule them

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-16 Thread Christoph Hellwig
Avi, cache=writeback can be faster than cache=none for the same reasons a disk cache speeds up access. As long as the I/O mix contains more asynchronous then synchronous writes it allows the host to do much more reordering, only limited by the cache size (which can be quite huge when using the

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-16 Thread Avi Kivity
On 03/16/2010 12:26 PM, Christoph Hellwig wrote: Avi, cache=writeback can be faster than cache=none for the same reasons a disk cache speeds up access. As long as the I/O mix contains more asynchronous then synchronous writes it allows the host to do much more reordering, only limited by the

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-16 Thread Christoph Hellwig
On Tue, Mar 16, 2010 at 12:36:31PM +0200, Avi Kivity wrote: Are you talking about direct volume access or qcow2? Doesn't matter. For direct volume access, I still don't get it. The number of barriers issues by the host must equal (or exceed, but that's pointless) the number of barriers

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-16 Thread Avi Kivity
On 03/16/2010 12:44 PM, Christoph Hellwig wrote: On Tue, Mar 16, 2010 at 12:36:31PM +0200, Avi Kivity wrote: Are you talking about direct volume access or qcow2? Doesn't matter. For direct volume access, I still don't get it. The number of barriers issues by the host must

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-16 Thread Balbir Singh
* Avi Kivity a...@redhat.com [2010-03-16 13:08:28]: On 03/16/2010 12:44 PM, Christoph Hellwig wrote: On Tue, Mar 16, 2010 at 12:36:31PM +0200, Avi Kivity wrote: Are you talking about direct volume access or qcow2? Doesn't matter. For direct volume access, I still don't get it. The number

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-16 Thread Avi Kivity
On 03/16/2010 04:27 PM, Balbir Singh wrote: Let's assume the guest has virtio (I agree with IDE we need reordering on the host). The guest sends batches of I/O separated by cache flushes. If the batches are smaller than the virtio queue length, ideally things look like: io_submit(...,

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Avi Kivity
On 03/15/2010 09:22 AM, Balbir Singh wrote: Selectively control Unmapped Page Cache (nospam version) From: Balbir Singhbal...@linux.vnet.ibm.com This patch implements unmapped page cache control via preferred page cache reclaim. The current patch hooks into kswapd and reclaims page cache if

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Balbir Singh
* Avi Kivity a...@redhat.com [2010-03-15 09:48:05]: On 03/15/2010 09:22 AM, Balbir Singh wrote: Selectively control Unmapped Page Cache (nospam version) From: Balbir Singhbal...@linux.vnet.ibm.com This patch implements unmapped page cache control via preferred page cache reclaim. The

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Avi Kivity
On 03/15/2010 10:07 AM, Balbir Singh wrote: * Avi Kivitya...@redhat.com [2010-03-15 09:48:05]: On 03/15/2010 09:22 AM, Balbir Singh wrote: Selectively control Unmapped Page Cache (nospam version) From: Balbir Singhbal...@linux.vnet.ibm.com This patch implements unmapped page

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Balbir Singh
* Avi Kivity a...@redhat.com [2010-03-15 10:27:45]: On 03/15/2010 10:07 AM, Balbir Singh wrote: * Avi Kivitya...@redhat.com [2010-03-15 09:48:05]: On 03/15/2010 09:22 AM, Balbir Singh wrote: Selectively control Unmapped Page Cache (nospam version) From: Balbir

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Avi Kivity
On 03/15/2010 11:17 AM, Balbir Singh wrote: * Avi Kivitya...@redhat.com [2010-03-15 10:27:45]: On 03/15/2010 10:07 AM, Balbir Singh wrote: * Avi Kivitya...@redhat.com [2010-03-15 09:48:05]: On 03/15/2010 09:22 AM, Balbir Singh wrote: Selectively control

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Balbir Singh
* Avi Kivity a...@redhat.com [2010-03-15 11:27:56]: The knobs are for 1. Selective enablement 2. Selective control of the % of unmapped pages An alternative path is to enable KSM for page cache. Then we have direct read-only guest access to host page cache, without any guest

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Randy Dunlap
On Mon, 15 Mar 2010 12:52:15 +0530 Balbir Singh wrote: Selectively control Unmapped Page Cache (nospam version) From: Balbir Singh bal...@linux.vnet.ibm.com This patch implements unmapped page cache control via preferred page cache reclaim. The current patch hooks into kswapd and reclaims

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Anthony Liguori
On 03/15/2010 04:27 AM, Avi Kivity wrote: That's only beneficial if the cache is shared. Otherwise, you could use the balloon to evict cache when memory is tight. Shared cache is mostly a desktop thing where users run similar workloads. For servers, it's much less likely. So a

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Chris Webb
Avi Kivity a...@redhat.com writes: On 03/15/2010 10:07 AM, Balbir Singh wrote: Yes, it is a virtio call away, but is the cost of paying twice in terms of memory acceptable? Usually, it isn't, which is why I recommend cache=off. Hi Avi. One observation about your recommendation for

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Anthony Liguori
On 03/15/2010 03:23 PM, Chris Webb wrote: Avi Kivitya...@redhat.com writes: On 03/15/2010 10:07 AM, Balbir Singh wrote: Yes, it is a virtio call away, but is the cost of paying twice in terms of memory acceptable? Usually, it isn't, which is why I recommend cache=off.

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Christoph Hellwig
On Mon, Mar 15, 2010 at 06:43:06PM -0500, Anthony Liguori wrote: I knew someone would do this... This really gets down to your definition of safe behaviour. As it stands, if you suffer a power outage, it may lead to guest corruption. While we are correct in advertising a write-cache,

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Anthony Liguori
On 03/15/2010 07:43 PM, Christoph Hellwig wrote: On Mon, Mar 15, 2010 at 06:43:06PM -0500, Anthony Liguori wrote: I knew someone would do this... This really gets down to your definition of safe behaviour. As it stands, if you suffer a power outage, it may lead to guest corruption. While

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Balbir Singh
* Chris Webb ch...@arachsys.com [2010-03-15 20:23:54]: Avi Kivity a...@redhat.com writes: On 03/15/2010 10:07 AM, Balbir Singh wrote: Yes, it is a virtio call away, but is the cost of paying twice in terms of memory acceptable? Usually, it isn't, which is why I recommend

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Balbir Singh
* Randy Dunlap randy.dun...@oracle.com [2010-03-15 08:46:31]: On Mon, 15 Mar 2010 12:52:15 +0530 Balbir Singh wrote: Hi, If you go ahead with this, please add the boot parameter its description to Documentation/kernel-parameters.txt. I certainly will, thanks for keeping a watch. --