subject:"Re\: \[RFC\/T\/D\]\[PATCH 2\/2\] Linux\/Guest cooperative unmapped page cache control"

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-17 Thread Balbir Singh

* Avi Kivity a...@redhat.com [2010-06-16 14:39:02]:

 We're talking about an environment which we're always trying to
 optimize.  Imagine that we're always trying to consolidate guests on to
 smaller numbers of hosts.  We're effectively in a state where we
 _always_ want new guests.
 
 If this came at no cost to the guests, you'd be right.  But at some
 point guest performance will be hit by this, so the advantage gained
 from freeing memory will be balanced by the disadvantage.
 
 Also, memory is not the only resource.  At some point you become cpu
 bound; at that point freeing memory doesn't help and in fact may
 increase your cpu load.


We'll probably need control over other resources as well, but IMHO
memory is the most precious because it is non-renewable. 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-16 Thread Avi Kivity


On 06/15/2010 05:47 PM, Dave Hansen wrote:



That's a bug that needs to be fixed.  Eventually the host will come
under pressure and will balloon the guest.  If that kills the guest, the
ballooning is not effective as a host memory management technique.
 

I'm not convinced that it's just a bug that can be fixed.  Consider a
case where a host sees a guest with 100MB of free memory at the exact
moment that a database app sees that memory.  The host tries to balloon
that memory away at the same time that the app goes and allocates it.
That can certainly lead to an OOM very quickly, even for very small
amounts of memory (much less than 100MB).  Where's the bug?

I think the issues are really fundamental to ballooning.
   


There are two issues involved.

One is, can the kernel accurately determine the amount of memory it 
needs to work?  We have resources such as RAM and swap.  We have 
liabilities in the form of swappable userspace memory, mlocked userspace 
memory, kernel memory to support these, and various reclaimable and 
non-reclaimable kernel caches.  Can we determine the minimum amount of 
RAM to support are workload at a point in time?


If we had this, we could modify the balloon to refuse to balloon if it 
takes the kernel beneath the minimum amount of RAM needed.


In fact, this is similar to allocating memory with overcommit_memory = 
0.  The difference is the balloon allocates mlocked memory, while normal 
allocations can be charged against swap.  But fundamentally it's the same.



If all the guests do this, then it leaves that much more free memory on
the host, which can be used flexibly for extra host page cache, new
guests, etc...
   

If the host detects lots of pagecache misses it can balloon guests
down.  If pagecache is quiet, why change anything?
 

Page cache misses alone are not really sufficient.  This is the classic
problem where we try to differentiate streaming I/O (which we can't
effectively cache) from I/O which can be effectively cached.
   


True.  Random I/O across a very large dataset is also difficult to cache.


If the host wants to start new guests, it can balloon guests down.  If
no new guests are wanted, why change anything?
 

We're talking about an environment which we're always trying to
optimize.  Imagine that we're always trying to consolidate guests on to
smaller numbers of hosts.  We're effectively in a state where we
_always_ want new guests.
   


If this came at no cost to the guests, you'd be right.  But at some 
point guest performance will be hit by this, so the advantage gained 
from freeing memory will be balanced by the disadvantage.


Also, memory is not the only resource.  At some point you become cpu 
bound; at that point freeing memory doesn't help and in fact may 
increase your cpu load.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-15 Thread Avi Kivity


On 06/14/2010 08:45 PM, Balbir Singh wrote:



There are two decisions that need to be made:

- how much memory a guest should be given
- given some guest memory, what's the best use for it

The first question can perhaps be answered by looking at guest I/O
rates and giving more memory to more active guests.  The second
question is hard, but not any different than running non-virtualized
- except if we can detect sharing or duplication.  In this case,
dropping a duplicated page is worthwhile, while dropping a shared
page provides no benefit.
 

I think there is another way of looking at it, give some free memory

1. Can the guest run more applications or run faster
   


That's my second question.  How to best use this memory.  More 
applications == drop the page from cache, faster == keep page in cache.


All we need is to select the right page to drop.


2. Can the host potentially get this memory via ballooning or some
other means to start newer guest instances
   


Well, we already have ballooning.  The question is can we improve the 
eviction algorithm.



I think the answer to 1 and 2 is yes.

   

How the patch helps answer either question, I'm not sure.  I don't
think preferential dropping of unmapped page cache is the answer.

 

Preferential dropping as selected by the host, that knows about the
setup and if there is duplication involved. While we use the term
preferential dropping, remember it is still via LRU and we don't
always succeed. It is a best effort (if you can and the unmapped pages
are not highly referenced) scenario.
   


How can the host tell if there is duplication?  It may know it has some 
pagecache, but it has no idea whether or to what extent guest pagecache 
duplicates host pagecache.



Those tell you how to balance going after the different classes of
things that we can reclaim.

Again, this is useless when ballooning is being used.  But, I'm thinking
of a more general mechanism to force the system to both have MemFree
_and_ be acting as if it is under memory pressure.
   

If there is no memory pressure on the host, there is no reason for
the guest to pretend it is under pressure.  If there is memory
pressure on the host, it should share the pain among its guests by
applying the balloon.  So I don't think voluntarily dropping cache
is a good direction.

 

There are two situations

1. Voluntarily drop cache, if it was setup to do so (the host knows
that it caches that information anyway)
   


It doesn't, really.  The host only has aggregate information about 
itself, and no information about the guest.


Dropping duplicate pages would be good if we could identify them.  Even 
then, it's better to drop the page from the host, not the guest, unless 
we know the same page is cached by multiple guests.


But why would the guest voluntarily drop the cache?  If there is no 
memory pressure, dropping caches increases cpu overhead and latency even 
if the data is still cached on the host.



2. Drop the cache on either a special balloon option, again the host
knows it caches that very same information, so it prefers to free that
up first.
   


Dropping in response to pressure is good.  I'm just not convinced the 
patch helps in selecting the correct page to drop.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-15 Thread Avi Kivity


On 06/14/2010 08:58 PM, Dave Hansen wrote:

On Mon, 2010-06-14 at 19:34 +0300, Avi Kivity wrote:
   

Again, this is useless when ballooning is being used.  But, I'm thinking
of a more general mechanism to force the system to both have MemFree
_and_ be acting as if it is under memory pressure.

   

If there is no memory pressure on the host, there is no reason for the
guest to pretend it is under pressure.
 

I can think of quite a few places where this would be beneficial.

Ballooning is dangerous.  I've OOMed quite a few guests by
over-ballooning them.  Anything that's voluntary like this is safer than
things imposed by the host, although you do trade of effectiveness.
   


That's a bug that needs to be fixed.  Eventually the host will come 
under pressure and will balloon the guest.  If that kills the guest, the 
ballooning is not effective as a host memory management technique.


Trying to defer ballooning by voluntarily dropping cache is simply 
trying to defer being bitten by the bug.



If all the guests do this, then it leaves that much more free memory on
the host, which can be used flexibly for extra host page cache, new
guests, etc...


If the host detects lots of pagecache misses it can balloon guests 
down.  If pagecache is quiet, why change anything?


If the host wants to start new guests, it can balloon guests down.  If 
no new guests are wanted, why change anything?


etc...


A system in this state where everyone is proactively
keeping their footprints down is more likely to be able to handle load
spikes.


That is true.  But from the guest's point of view, voluntarily giving up 
memory means dropping the guest's cushion vs load spikes.



Reclaim is an expensive, costly activity, and this ensures that
we don't have to do that when we're busy doing other things like
handling load spikes.


The guest doesn't want to reclaim memory from the host when it's under a 
load spike either.



This was one of the concepts behind CMM2: reduce
the overhead during peak periods.
   


Ah, but CMM2 actually reduced work being done by sharing information 
between guest and host.



It's also handy for planning.  Guests exhibiting this behavior will
_act_ as if they're under pressure.  That's a good thing to approximate
how a guest will act when it _is_ under pressure.
   


If a guest acts as if it is under pressure, then it will be slower and 
consume more cpu.  Bad for both guest and host.



If there is memory pressure on
the host, it should share the pain among its guests by applying the
balloon.  So I don't think voluntarily dropping cache is a good direction.
 

I think we're trying to consider things slightly outside of ballooning
at this point.  If ballooning was the end-all solution, I'm fairly sure
Balbir wouldn't be looking at this stuff.  Just trying to keep options
open. :)
   


I see this as an extension to ballooning - perhaps I'm missing the big 
picture.  I would dearly love to have CMM2 where decisions are made on a 
per-page basis instead of using heuristics.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-15 Thread Avi Kivity


On 06/14/2010 08:40 PM, Balbir Singh wrote:

* Avi Kivitya...@redhat.com  [2010-06-14 18:34:58]:

   

On 06/14/2010 06:12 PM, Dave Hansen wrote:
 

On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
   

1. A slab page will not be freed until the entire page is free (all
slabs have been kfree'd so to speak). Normal reclaim will definitely
free this page, but a lot of it depends on how frequently we are
scanning the LRU list and when this page got added.
 

You don't have to be freeing entire slab pages for the reclaim to have
been useful.  You could just be making space so that _future_
allocations fill in the slab holes you just created.  You may not be
freeing pages, but you're reducing future system pressure.
   

Depends.  If you've evicted something that will be referenced soon,
you're increasing system pressure.

 

I don't think slab pages care about being referenced soon, they are
either allocated or freed. A page is just a storage unit for the data
structure. A new one can be allocated on demand.
   


If we're talking just about slab pages, I agree.  If we're applying 
pressure on the shrinkers, then you are removing live objects which can 
be costly to reinstantiate.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-15 Thread Avi Kivity


On 06/14/2010 08:16 PM, Balbir Singh wrote:

* Dave Hansend...@linux.vnet.ibm.com  [2010-06-14 10:09:31]:

   

On Mon, 2010-06-14 at 22:28 +0530, Balbir Singh wrote:
 

If you've got duplicate pages and you know
that they are duplicated and can be retrieved at a lower cost, why
wouldn't we go after them first?
   

I agree with this in theory.  But, the guest lacks the information about
what is truly duplicated and what the costs are for itself and/or the
host to recreate it.  Unmapped page cache may be the best proxy that
we have at the moment for easy to recreate, but I think it's still too
poor a match to make these patches useful.

 

That is why the policy (in the next set) will come from the host. As
to whether the data is truly duplicated, my experiments show up to 60%
of the page cache is duplicated.


Isn't that incredibly workload dependent?

We can't expect the host admin to know whether duplication will occur or 
not.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-15 Thread Balbir Singh

* Avi Kivity a...@redhat.com [2010-06-15 09:58:33]:

 On 06/14/2010 08:45 PM, Balbir Singh wrote:
 
 There are two decisions that need to be made:
 
 - how much memory a guest should be given
 - given some guest memory, what's the best use for it
 
 The first question can perhaps be answered by looking at guest I/O
 rates and giving more memory to more active guests.  The second
 question is hard, but not any different than running non-virtualized
 - except if we can detect sharing or duplication.  In this case,
 dropping a duplicated page is worthwhile, while dropping a shared
 page provides no benefit.
 I think there is another way of looking at it, give some free memory
 
 1. Can the guest run more applications or run faster
 
 That's my second question.  How to best use this memory.  More
 applications == drop the page from cache, faster == keep page in
 cache.
 
 All we need is to select the right page to drop.


Do we need to drop to the granularity of the page to drop? I think
figuring out the class of pages and making sure that we don't write
our own reclaim logic, but work with what we have to identify the
class of pages is a good start. 
 
 2. Can the host potentially get this memory via ballooning or some
 other means to start newer guest instances
 
 Well, we already have ballooning.  The question is can we improve
 the eviction algorithm.
 
 I think the answer to 1 and 2 is yes.
 
 How the patch helps answer either question, I'm not sure.  I don't
 think preferential dropping of unmapped page cache is the answer.
 
 Preferential dropping as selected by the host, that knows about the
 setup and if there is duplication involved. While we use the term
 preferential dropping, remember it is still via LRU and we don't
 always succeed. It is a best effort (if you can and the unmapped pages
 are not highly referenced) scenario.
 
 How can the host tell if there is duplication?  It may know it has
 some pagecache, but it has no idea whether or to what extent guest
 pagecache duplicates host pagecache.
 

Well it is possible in host user space, I for example use memory
cgroup and through the stats I have a good idea of how much is duplicated.
I am ofcourse making an assumption with my setup of the cached mode,
that the data in the guest page cache and page cache in the cgroup
will be duplicated to a large extent. I did some trivial experiments
like drop the data from the guest and look at the cost of bringing it
in and dropping the data from both guest and host and look at the
cost. I could see a difference.

Unfortunately, I did not save the data, so I'll need to redo the
experiment.

 Those tell you how to balance going after the different classes of
 things that we can reclaim.
 
 Again, this is useless when ballooning is being used.  But, I'm thinking
 of a more general mechanism to force the system to both have MemFree
 _and_ be acting as if it is under memory pressure.
 If there is no memory pressure on the host, there is no reason for
 the guest to pretend it is under pressure.  If there is memory
 pressure on the host, it should share the pain among its guests by
 applying the balloon.  So I don't think voluntarily dropping cache
 is a good direction.
 
 There are two situations
 
 1. Voluntarily drop cache, if it was setup to do so (the host knows
 that it caches that information anyway)
 
 It doesn't, really.  The host only has aggregate information about
 itself, and no information about the guest.
 
 Dropping duplicate pages would be good if we could identify them.
 Even then, it's better to drop the page from the host, not the
 guest, unless we know the same page is cached by multiple guests.


On the exact pages to drop, please see my comments above on the class
of pages to drop.
There are reasons for wanting to get the host to cache the data

Unless the guest is using cache = none, the data will still hit the
host page cache
The host can do a better job of optimizing the writeouts
 
 But why would the guest voluntarily drop the cache?  If there is no
 memory pressure, dropping caches increases cpu overhead and latency
 even if the data is still cached on the host.
 

So, there are basically two approaches

1. First patch, proactive - enabled by a boot option
2. When ballooned, we try to (please NOTE try to) reclaim cached pages
first. Failing which, we go after regular pages in the alloc_page()
call in the balloon driver.

 2. Drop the cache on either a special balloon option, again the host
 knows it caches that very same information, so it prefers to free that
 up first.
 
 Dropping in response to pressure is good.  I'm just not convinced
 the patch helps in selecting the correct page to drop.


That is why I've presented data on the experiments I've run and
provided more arguments to backup the approach. 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-15 Thread Balbir Singh

* Avi Kivity a...@redhat.com [2010-06-15 10:12:44]:

 On 06/14/2010 08:16 PM, Balbir Singh wrote:
 * Dave Hansend...@linux.vnet.ibm.com  [2010-06-14 10:09:31]:
 
 On Mon, 2010-06-14 at 22:28 +0530, Balbir Singh wrote:
 If you've got duplicate pages and you know
 that they are duplicated and can be retrieved at a lower cost, why
 wouldn't we go after them first?
 I agree with this in theory.  But, the guest lacks the information about
 what is truly duplicated and what the costs are for itself and/or the
 host to recreate it.  Unmapped page cache may be the best proxy that
 we have at the moment for easy to recreate, but I think it's still too
 poor a match to make these patches useful.
 
 That is why the policy (in the next set) will come from the host. As
 to whether the data is truly duplicated, my experiments show up to 60%
 of the page cache is duplicated.
 
 Isn't that incredibly workload dependent?
 
 We can't expect the host admin to know whether duplication will
 occur or not.


I was referring to cache = (policy) we use based on the setup. I don't
think the duplication is too workload specific. Moreover, we could use
aggressive policies and restrict page cache usage or do it selectively
on ballooning. We could also add other options to make the ballooning
option truly optional, so that the system management software decides. 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-15 Thread Avi Kivity


On 06/15/2010 10:49 AM, Balbir Singh wrote:



All we need is to select the right page to drop.

 

Do we need to drop to the granularity of the page to drop? I think
figuring out the class of pages and making sure that we don't write
our own reclaim logic, but work with what we have to identify the
class of pages is a good start.
   


Well, the class of pages are 'pages that are duplicated on the host'.  
Unmapped page cache pages are 'pages that might be duplicated on the 
host'.  IMO, that's not close enough.



How can the host tell if there is duplication?  It may know it has
some pagecache, but it has no idea whether or to what extent guest
pagecache duplicates host pagecache.

 

Well it is possible in host user space, I for example use memory
cgroup and through the stats I have a good idea of how much is duplicated.
I am ofcourse making an assumption with my setup of the cached mode,
that the data in the guest page cache and page cache in the cgroup
will be duplicated to a large extent. I did some trivial experiments
like drop the data from the guest and look at the cost of bringing it
in and dropping the data from both guest and host and look at the
cost. I could see a difference.

Unfortunately, I did not save the data, so I'll need to redo the
experiment.
   


I'm sure we can detect it experimentally, but how do we do it 
programatically at run time (without dropping all the pages).  
Situations change, and I don't think we can infer from a few experiments 
that we'll have a similar amount of sharing.  The cost of an incorrect 
decision is too high IMO (not that I think the kernel always chooses the 
right pages now, but I'd like to avoid regressions from the 
unvirtualized state).


btw, when running with a disk controller that has a very large cache, we 
might also see duplication between guest and host.  So, if this is a 
good idea, it shouldn't be enabled just for virtualization, but for any 
situation where we have a sizeable cache behind us.



It doesn't, really.  The host only has aggregate information about
itself, and no information about the guest.

Dropping duplicate pages would be good if we could identify them.
Even then, it's better to drop the page from the host, not the
guest, unless we know the same page is cached by multiple guests.

 

On the exact pages to drop, please see my comments above on the class
of pages to drop.
   


Well, we disagree about that.  There is some value in dropping 
duplicated pages (not always), but that's not what the patch does.  It 
drops unmapped pagecache pages, which may or may not be duplicated.



There are reasons for wanting to get the host to cache the data
   


There are also reasons to get the guest to cache the data - it's more 
efficient to access it in the guest.



Unless the guest is using cache = none, the data will still hit the
host page cache
The host can do a better job of optimizing the writeouts
   


True, especially for non-raw storage.  But even there we have to fsync 
all the time to keep the metadata right.



But why would the guest voluntarily drop the cache?  If there is no
memory pressure, dropping caches increases cpu overhead and latency
even if the data is still cached on the host.

 

So, there are basically two approaches

1. First patch, proactive - enabled by a boot option
2. When ballooned, we try to (please NOTE try to) reclaim cached pages
first. Failing which, we go after regular pages in the alloc_page()
call in the balloon driver.
   


Doesn't that mean you may evict a RU mapped page ahead of an LRU 
unmapped page, just in the hope that it is double-cached?


Maybe we need the guest and host to talk to each other about which pages 
to keep.



2. Drop the cache on either a special balloon option, again the host
knows it caches that very same information, so it prefers to free that
up first.
   

Dropping in response to pressure is good.  I'm just not convinced
the patch helps in selecting the correct page to drop.

 

That is why I've presented data on the experiments I've run and
provided more arguments to backup the approach.
   


I'm still unconvinced, sorry.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-15 Thread Avi Kivity


On 06/15/2010 10:52 AM, Balbir Singh wrote:


That is why the policy (in the next set) will come from the host. As
to whether the data is truly duplicated, my experiments show up to 60%
of the page cache is duplicated.
   

Isn't that incredibly workload dependent?

We can't expect the host admin to know whether duplication will
occur or not.

 

I was referring to cache = (policy) we use based on the setup. I don't
think the duplication is too workload specific. Moreover, we could use
aggressive policies and restrict page cache usage or do it selectively
on ballooning. We could also add other options to make the ballooning
option truly optional, so that the system management software decides.
   


Consider a read-only workload that exactly fits in guest cache.  Without 
trimming, the guest will keep hitting its own cache, and the host will 
see no access to the cache at all.  So the host (assuming it is under 
even low pressure) will evict those pages, and the guest will happily 
use its own cache.  If we start to trim, the guest will have to go to 
disk.  That's the best case.


Now for the worst case.  A random access workload that misses the cache 
on both guest and host.  Now every page is duplicated, and trimming 
guest pages allows the host to increase its cache, and potentially 
reduce misses.  In this case trimming duplicated pages works.


Real life will see a mix of this.  Often used pages won't be duplicated, 
and less often used pages may see some duplication, especially if the 
host cache portion dedicated to the guest is bigger than the guest cache.


I can see that trimming duplicate pages helps, but (a) I'd like to be 
sure they are duplicates and (b) often trimming them from the host is 
better than trimming them from the guest.


Trimming from the guest is worthwhile if the pages are not used very 
often (but enough that caching them in the host is worth it) and if the 
host cache can serve more than one guest.  If we can identify those 
pages, we don't risk degrading best-case workloads (as defined above).


(note ksm to some extent identifies those pages, though it is a bit 
expensive, and doesn't share with the host pagecache).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-15 Thread Balbir Singh

* Avi Kivity a...@redhat.com [2010-06-15 12:44:31]:

 On 06/15/2010 10:49 AM, Balbir Singh wrote:
 
 All we need is to select the right page to drop.
 
 Do we need to drop to the granularity of the page to drop? I think
 figuring out the class of pages and making sure that we don't write
 our own reclaim logic, but work with what we have to identify the
 class of pages is a good start.
 
 Well, the class of pages are 'pages that are duplicated on the
 host'.  Unmapped page cache pages are 'pages that might be
 duplicated on the host'.  IMO, that's not close enough.


Agreed, but what happens in reality with the code is that it drops
not-so-frequently-used cache (still reusing the reclaim mechanism),
but prioritizing cached memory.
 
 How can the host tell if there is duplication?  It may know it has
 some pagecache, but it has no idea whether or to what extent guest
 pagecache duplicates host pagecache.
 
 Well it is possible in host user space, I for example use memory
 cgroup and through the stats I have a good idea of how much is duplicated.
 I am ofcourse making an assumption with my setup of the cached mode,
 that the data in the guest page cache and page cache in the cgroup
 will be duplicated to a large extent. I did some trivial experiments
 like drop the data from the guest and look at the cost of bringing it
 in and dropping the data from both guest and host and look at the
 cost. I could see a difference.
 
 Unfortunately, I did not save the data, so I'll need to redo the
 experiment.
 
 I'm sure we can detect it experimentally, but how do we do it
 programatically at run time (without dropping all the pages).
 Situations change, and I don't think we can infer from a few
 experiments that we'll have a similar amount of sharing.  The cost
 of an incorrect decision is too high IMO (not that I think the
 kernel always chooses the right pages now, but I'd like to avoid
 regressions from the unvirtualized state).
 
 btw, when running with a disk controller that has a very large
 cache, we might also see duplication between guest and host.  So,
 if this is a good idea, it shouldn't be enabled just for
 virtualization, but for any situation where we have a sizeable cache
 behind us.
 

It depends, once the disk controller has the cache and the pages in
the guest are not-so-frequently-used we can drop them. Please remember
we still use the LRU to identify these pages.

 It doesn't, really.  The host only has aggregate information about
 itself, and no information about the guest.
 
 Dropping duplicate pages would be good if we could identify them.
 Even then, it's better to drop the page from the host, not the
 guest, unless we know the same page is cached by multiple guests.
 
 On the exact pages to drop, please see my comments above on the class
 of pages to drop.
 
 Well, we disagree about that.  There is some value in dropping
 duplicated pages (not always), but that's not what the patch does.
 It drops unmapped pagecache pages, which may or may not be
 duplicated.
 
 There are reasons for wanting to get the host to cache the data
 
 There are also reasons to get the guest to cache the data - it's
 more efficient to access it in the guest.
 
 Unless the guest is using cache = none, the data will still hit the
 host page cache
 The host can do a better job of optimizing the writeouts
 
 True, especially for non-raw storage.  But even there we have to
 fsync all the time to keep the metadata right.
 
 But why would the guest voluntarily drop the cache?  If there is no
 memory pressure, dropping caches increases cpu overhead and latency
 even if the data is still cached on the host.
 
 So, there are basically two approaches
 
 1. First patch, proactive - enabled by a boot option
 2. When ballooned, we try to (please NOTE try to) reclaim cached pages
 first. Failing which, we go after regular pages in the alloc_page()
 call in the balloon driver.
 
 Doesn't that mean you may evict a RU mapped page ahead of an LRU
 unmapped page, just in the hope that it is double-cached?
 
 Maybe we need the guest and host to talk to each other about which
 pages to keep.
 

Yeah.. I guess that falls into the domain of CMM.

 2. Drop the cache on either a special balloon option, again the host
 knows it caches that very same information, so it prefers to free that
 up first.
 Dropping in response to pressure is good.  I'm just not convinced
 the patch helps in selecting the correct page to drop.
 
 That is why I've presented data on the experiments I've run and
 provided more arguments to backup the approach.
 
 I'm still unconvinced, sorry.
 

The reason for making this optional is to let the administrators
decide how they want to use the memory in the system. In some
situations it might be a big no-no to waste memory, in some cases it
might be acceptable. 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-15 Thread Balbir Singh

* Avi Kivity a...@redhat.com [2010-06-15 12:54:31]:

 On 06/15/2010 10:52 AM, Balbir Singh wrote:
 
 That is why the policy (in the next set) will come from the host. As
 to whether the data is truly duplicated, my experiments show up to 60%
 of the page cache is duplicated.
 Isn't that incredibly workload dependent?
 
 We can't expect the host admin to know whether duplication will
 occur or not.
 
 I was referring to cache = (policy) we use based on the setup. I don't
 think the duplication is too workload specific. Moreover, we could use
 aggressive policies and restrict page cache usage or do it selectively
 on ballooning. We could also add other options to make the ballooning
 option truly optional, so that the system management software decides.
 
 Consider a read-only workload that exactly fits in guest cache.
 Without trimming, the guest will keep hitting its own cache, and the
 host will see no access to the cache at all.  So the host (assuming
 it is under even low pressure) will evict those pages, and the guest
 will happily use its own cache.  If we start to trim, the guest will
 have to go to disk.  That's the best case.

 Now for the worst case.  A random access workload that misses the
 cache on both guest and host.  Now every page is duplicated, and
 trimming guest pages allows the host to increase its cache, and
 potentially reduce misses.  In this case trimming duplicated pages
 works.
 
 Real life will see a mix of this.  Often used pages won't be
 duplicated, and less often used pages may see some duplication,
 especially if the host cache portion dedicated to the guest is
 bigger than the guest cache.
 
 I can see that trimming duplicate pages helps, but (a) I'd like to
 be sure they are duplicates and (b) often trimming them from the
 host is better than trimming them from the guest.


Lets see the behaviour with these patches

The first patch is a proactive approach to keep more memory around.
Enabling the parameter implies we are OK paying the cost of some
overhead. My data shows that leaves a significant amount of free
memory with a small 5% (in my case) overhead. This brings us back to
what you can do with free memory.

The second patch shows no overhead and selectively tries to use free
cache to return back on memory pressure (as indicated by the balloon
driver). We've discussed the reasons for doing this

1. In the situations where cache is duplicated this should benefit
us. Your contention is that we need to be specific about the
duplication. That falls under the realm of CMM.
2. In the case of slab cache, duplication does not matter, it is a
free page, that should be reclaimed ahead of mapped pages ideally.
If the slab grows, it will get another new page.

What is the cost of (1)

In the worst case, we select a non-duplicated page, but for us to
select it, it should be inactive, in that case we do I/O to bring back
the page.

 Trimming from the guest is worthwhile if the pages are not used very
 often (but enough that caching them in the host is worth it) and if
 the host cache can serve more than one guest.  If we can identify
 those pages, we don't risk degrading best-case workloads (as defined
 above).
 
 (note ksm to some extent identifies those pages, though it is a bit
 expensive, and doesn't share with the host pagecache).


I see that you are hinting towards finding exact duplicates, I don't
know if the cost and complexity justify it. I hope more users can try
the patches with and without the boot parameter and provide additional
feedback.

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-15 Thread Dave Hansen

On Tue, 2010-06-15 at 10:07 +0300, Avi Kivity wrote:
 On 06/14/2010 08:58 PM, Dave Hansen wrote:
  On Mon, 2010-06-14 at 19:34 +0300, Avi Kivity wrote:
 
  Again, this is useless when ballooning is being used.  But, I'm thinking
  of a more general mechanism to force the system to both have MemFree
  _and_ be acting as if it is under memory pressure.
 
 
  If there is no memory pressure on the host, there is no reason for the
  guest to pretend it is under pressure.
   
  I can think of quite a few places where this would be beneficial.
 
  Ballooning is dangerous.  I've OOMed quite a few guests by
  over-ballooning them.  Anything that's voluntary like this is safer than
  things imposed by the host, although you do trade of effectiveness.
 
 That's a bug that needs to be fixed.  Eventually the host will come 
 under pressure and will balloon the guest.  If that kills the guest, the 
 ballooning is not effective as a host memory management technique.

I'm not convinced that it's just a bug that can be fixed.  Consider a
case where a host sees a guest with 100MB of free memory at the exact
moment that a database app sees that memory.  The host tries to balloon
that memory away at the same time that the app goes and allocates it.
That can certainly lead to an OOM very quickly, even for very small
amounts of memory (much less than 100MB).  Where's the bug?

I think the issues are really fundamental to ballooning.

  If all the guests do this, then it leaves that much more free memory on
  the host, which can be used flexibly for extra host page cache, new
  guests, etc...
 
 If the host detects lots of pagecache misses it can balloon guests 
 down.  If pagecache is quiet, why change anything?

Page cache misses alone are not really sufficient.  This is the classic
problem where we try to differentiate streaming I/O (which we can't
effectively cache) from I/O which can be effectively cached.

 If the host wants to start new guests, it can balloon guests down.  If 
 no new guests are wanted, why change anything?

We're talking about an environment which we're always trying to
optimize.  Imagine that we're always trying to consolidate guests on to
smaller numbers of hosts.  We're effectively in a state where we
_always_ want new guests.

-- Dave

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Avi Kivity


On 06/11/2010 07:56 AM, Balbir Singh wrote:



Just to be clear, let's say we have a mapped page (say of /sbin/init)
that's been unreferenced since _just_ after the system booted.  We also
have an unmapped page cache page of a file often used at runtime, say
one from /etc/resolv.conf or /etc/passwd.

Which page will be preferred for eviction with this patch set?

 

In this case the order is as follows

1. First we pick free pages if any
2. If we don't have free pages, we go after unmapped page cache and
slab cache
3. If that fails as well, we go after regularly memory

In the scenario that you describe, we'll not be able to easily free up
the frequently referenced page from /etc/*. The code will move on to
step 3 and do its regular reclaim.
   


Still it seems to me you are subverting the normal order of reclaim.  I 
don't see why an unmapped page cache or slab cache item should be 
evicted before a mapped page.  Certainly the cost of rebuilding a dentry 
compared to the gain from evicting it, is much higher than that of 
reestablishing a mapped page.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Balbir Singh

* Avi Kivity a...@redhat.com [2010-06-14 11:09:44]:

 On 06/11/2010 07:56 AM, Balbir Singh wrote:
 
 Just to be clear, let's say we have a mapped page (say of /sbin/init)
 that's been unreferenced since _just_ after the system booted.  We also
 have an unmapped page cache page of a file often used at runtime, say
 one from /etc/resolv.conf or /etc/passwd.
 
 Which page will be preferred for eviction with this patch set?
 
 In this case the order is as follows
 
 1. First we pick free pages if any
 2. If we don't have free pages, we go after unmapped page cache and
 slab cache
 3. If that fails as well, we go after regularly memory
 
 In the scenario that you describe, we'll not be able to easily free up
 the frequently referenced page from /etc/*. The code will move on to
 step 3 and do its regular reclaim.
 
 Still it seems to me you are subverting the normal order of reclaim.
 I don't see why an unmapped page cache or slab cache item should be
 evicted before a mapped page.  Certainly the cost of rebuilding a
 dentry compared to the gain from evicting it, is much higher than
 that of reestablishing a mapped page.


Subverting to aviod memory duplication, the word subverting is
overloaded, let me try to reason a bit. First let me explain the
problem

Memory is a precious resource in a consolidated environment.
We don't want to waste memory via page cache duplication
(cache=writethrough and cache=writeback mode).

Now here is what we are trying to do

1. A slab page will not be freed until the entire page is free (all
slabs have been kfree'd so to speak). Normal reclaim will definitely
free this page, but a lot of it depends on how frequently we are
scanning the LRU list and when this page got added.
2. In the case of page cache (specifically unmapped page cache), there
is duplication already, so why not go after unmapped page caches when
the system is under memory pressure?

In the case of 1, we don't force a dentry to be freed, but rather a
freed page in the slab cache to be reclaimed ahead of forcing reclaim
of mapped pages.

Does the problem statement make sense? If so, do you agree with 1 and
2? Is there major concern about subverting regular reclaim? Does
subverting it make sense in the duplicated scenario?

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Avi Kivity


On 06/14/2010 11:48 AM, Balbir Singh wrote:


In this case the order is as follows

1. First we pick free pages if any
2. If we don't have free pages, we go after unmapped page cache and
slab cache
3. If that fails as well, we go after regularly memory

In the scenario that you describe, we'll not be able to easily free up
the frequently referenced page from /etc/*. The code will move on to
step 3 and do its regular reclaim.
   

Still it seems to me you are subverting the normal order of reclaim.
I don't see why an unmapped page cache or slab cache item should be
evicted before a mapped page.  Certainly the cost of rebuilding a
dentry compared to the gain from evicting it, is much higher than
that of reestablishing a mapped page.

 

Subverting to aviod memory duplication, the word subverting is
overloaded,


Right, should have used a different one.


let me try to reason a bit. First let me explain the
problem

Memory is a precious resource in a consolidated environment.
We don't want to waste memory via page cache duplication
(cache=writethrough and cache=writeback mode).

Now here is what we are trying to do

1. A slab page will not be freed until the entire page is free (all
slabs have been kfree'd so to speak). Normal reclaim will definitely
free this page, but a lot of it depends on how frequently we are
scanning the LRU list and when this page got added.
2. In the case of page cache (specifically unmapped page cache), there
is duplication already, so why not go after unmapped page caches when
the system is under memory pressure?

In the case of 1, we don't force a dentry to be freed, but rather a
freed page in the slab cache to be reclaimed ahead of forcing reclaim
of mapped pages.
   


Sounds like this should be done unconditionally, then.  An empty slab 
page is worth less than an unmapped pagecache page at all times, no?



Does the problem statement make sense? If so, do you agree with 1 and
2? Is there major concern about subverting regular reclaim? Does
subverting it make sense in the duplicated scenario?

   


In the case of 2, how do you know there is duplication?  You know the 
guest caches the page, but you have no information about the host.  
Since the page is cached in the guest, the host doesn't see it 
referenced, and is likely to drop it.


If there is no duplication, then you may have dropped a recently-used 
page and will likely cause a major fault soon.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Balbir Singh

* Avi Kivity a...@redhat.com [2010-06-14 15:40:28]:

 On 06/14/2010 11:48 AM, Balbir Singh wrote:
 
 In this case the order is as follows
 
 1. First we pick free pages if any
 2. If we don't have free pages, we go after unmapped page cache and
 slab cache
 3. If that fails as well, we go after regularly memory
 
 In the scenario that you describe, we'll not be able to easily free up
 the frequently referenced page from /etc/*. The code will move on to
 step 3 and do its regular reclaim.
 Still it seems to me you are subverting the normal order of reclaim.
 I don't see why an unmapped page cache or slab cache item should be
 evicted before a mapped page.  Certainly the cost of rebuilding a
 dentry compared to the gain from evicting it, is much higher than
 that of reestablishing a mapped page.
 
 Subverting to aviod memory duplication, the word subverting is
 overloaded,
 
 Right, should have used a different one.
 
 let me try to reason a bit. First let me explain the
 problem
 
 Memory is a precious resource in a consolidated environment.
 We don't want to waste memory via page cache duplication
 (cache=writethrough and cache=writeback mode).
 
 Now here is what we are trying to do
 
 1. A slab page will not be freed until the entire page is free (all
 slabs have been kfree'd so to speak). Normal reclaim will definitely
 free this page, but a lot of it depends on how frequently we are
 scanning the LRU list and when this page got added.
 2. In the case of page cache (specifically unmapped page cache), there
 is duplication already, so why not go after unmapped page caches when
 the system is under memory pressure?
 
 In the case of 1, we don't force a dentry to be freed, but rather a
 freed page in the slab cache to be reclaimed ahead of forcing reclaim
 of mapped pages.
 
 Sounds like this should be done unconditionally, then.  An empty
 slab page is worth less than an unmapped pagecache page at all
 times, no?


In a consolidated environment, even at the cost of some CPU to run
shrinkers, I think potentially yes.
 
 Does the problem statement make sense? If so, do you agree with 1 and
 2? Is there major concern about subverting regular reclaim? Does
 subverting it make sense in the duplicated scenario?
 
 
 In the case of 2, how do you know there is duplication?  You know
 the guest caches the page, but you have no information about the
 host.  Since the page is cached in the guest, the host doesn't see
 it referenced, and is likely to drop it.

True, that is why the first patch is controlled via a boot parameter
that the host can pass. For the second patch, I think we'll need
something like a balloon size cache? with the cache argument being
optional. 

 
 If there is no duplication, then you may have dropped a
 recently-used page and will likely cause a major fault soon.


Yes, agreed. 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Avi Kivity


On 06/14/2010 03:50 PM, Balbir Singh wrote:





let me try to reason a bit. First let me explain the
problem

Memory is a precious resource in a consolidated environment.
We don't want to waste memory via page cache duplication
(cache=writethrough and cache=writeback mode).

Now here is what we are trying to do

1. A slab page will not be freed until the entire page is free (all
slabs have been kfree'd so to speak). Normal reclaim will definitely
free this page, but a lot of it depends on how frequently we are
scanning the LRU list and when this page got added.
2. In the case of page cache (specifically unmapped page cache), there
is duplication already, so why not go after unmapped page caches when
the system is under memory pressure?

In the case of 1, we don't force a dentry to be freed, but rather a
freed page in the slab cache to be reclaimed ahead of forcing reclaim
of mapped pages.
   

Sounds like this should be done unconditionally, then.  An empty
slab page is worth less than an unmapped pagecache page at all
times, no?

 

In a consolidated environment, even at the cost of some CPU to run
shrinkers, I think potentially yes.
   


I don't understand.  If you're running the shrinkers then you're 
evicting live entries, which could cost you an I/O each.  That's 
expensive, consolidated or not.


If you're not running the shrinkers, why does it matter if you're 
consolidated or not?  Drop that age unconditionally.



Does the problem statement make sense? If so, do you agree with 1 and
2? Is there major concern about subverting regular reclaim? Does
subverting it make sense in the duplicated scenario?

   

In the case of 2, how do you know there is duplication?  You know
the guest caches the page, but you have no information about the
host.  Since the page is cached in the guest, the host doesn't see
it referenced, and is likely to drop it.
 

True, that is why the first patch is controlled via a boot parameter
that the host can pass. For the second patch, I think we'll need
something like a balloonsize  cache?  with the cache argument being
optional.
   


Whether a page is duplicated on the host or not is per-page, it cannot 
be a boot parameter.


If we drop unmapped pagecache pages, we need to be sure they can be 
backed by the host, and that depends on the amount of sharing.


Overall, I don't see how a user can tune this.  If I were a guest admin, 
I'd play it safe by not assuming the host will back me, and disabling 
the feature.


To get something like this to work, we need to reward cooperating guests 
somehow.



If there is no duplication, then you may have dropped a
recently-used page and will likely cause a major fault soon.
 

Yes, agreed.
   


So how do we deal with this?



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Dave Hansen

On Mon, 2010-06-14 at 16:01 +0300, Avi Kivity wrote:
 If we drop unmapped pagecache pages, we need to be sure they can be 
 backed by the host, and that depends on the amount of sharing.

You also have to set up the host up properly, and continue to maintain
it in a way that finds and eliminates duplicates.

I saw some benchmarks where KSM was doing great, finding lots of
duplicate pages.  Then, the host filled up, and guests started
reclaiming.  As memory pressure got worse, so did KSM's ability to find
duplicates.

At the same time, I see what you're trying to do with this.  It really
can be an alternative to ballooning if we do it right, since ballooning
would probably evict similar pages.  Although it would only work in idle
guests, what about a knob that the host can turn to just get the guest
to start running reclaim?

-- Dave

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Avi Kivity


On 06/14/2010 06:12 PM, Dave Hansen wrote:

On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
   

1. A slab page will not be freed until the entire page is free (all
slabs have been kfree'd so to speak). Normal reclaim will definitely
free this page, but a lot of it depends on how frequently we are
scanning the LRU list and when this page got added.
 

You don't have to be freeing entire slab pages for the reclaim to have
been useful.  You could just be making space so that _future_
allocations fill in the slab holes you just created.  You may not be
freeing pages, but you're reducing future system pressure.
   


Depends.  If you've evicted something that will be referenced soon, 
you're increasing system pressure.



If unmapped page cache is the easiest thing to evict, then it should be
the first thing that goes when a balloon request comes in, which is the
case this patch is trying to handle.  If it isn't the easiest thing to
evict, then we _shouldn't_ evict it.
   


Easy to evict is just one measure.  There's benefit (size of data 
evicted), cost to refill (seeks, cpu), and likelihood that the cost to 
refill will be incurred (recency).


It's all very complicated.  We need better information to make these 
decisions.  For one thing, I'd like to see age information tied to 
objects.  We may have two pages that were referenced in wildly different 
times be next to each other in LRU order.  We have many LRUs, but no 
idea of the relative recency of the tails of those LRUs.


If each page or object had an age, we could scale those ages by the 
benefit from reclaim and cost to refill and make a better decision as to 
what to evict first.  But of course page-age means increasing sizeof 
struct page, and we can only approximate its value by scanning the 
accessed bit, not determine it accurately (unlike the other objects 
managed by the cache).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Avi Kivity


On 06/14/2010 06:33 PM, Dave Hansen wrote:

On Mon, 2010-06-14 at 16:01 +0300, Avi Kivity wrote:
   

If we drop unmapped pagecache pages, we need to be sure they can be
backed by the host, and that depends on the amount of sharing.
 

You also have to set up the host up properly, and continue to maintain
it in a way that finds and eliminates duplicates.

I saw some benchmarks where KSM was doing great, finding lots of
duplicate pages.  Then, the host filled up, and guests started
reclaiming.  As memory pressure got worse, so did KSM's ability to find
duplicates.
   


Yup.  KSM needs to be backed up by ballooning, swap, and live migration.


At the same time, I see what you're trying to do with this.  It really
can be an alternative to ballooning if we do it right, since ballooning
would probably evict similar pages.  Although it would only work in idle
guests, what about a knob that the host can turn to just get the guest
to start running reclaim?
   


Isn't the knob in this proposal the balloon?  AFAICT, the idea here is 
to change how the guest reacts to being ballooned, but the trigger 
itself would not change.


My issue is that changing the type of object being preferentially 
reclaimed just changes the type of workload that would prematurely 
suffer from reclaim.  In this case, workloads that use a lot of unmapped 
pagecache would suffer.


btw, aren't /proc/sys/vm/swapiness and vfs_cache_pressure similar knobs?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Dave Hansen

On Mon, 2010-06-14 at 18:44 +0300, Avi Kivity wrote:
 On 06/14/2010 06:33 PM, Dave Hansen wrote:
  At the same time, I see what you're trying to do with this.  It really
  can be an alternative to ballooning if we do it right, since ballooning
  would probably evict similar pages.  Although it would only work in idle
  guests, what about a knob that the host can turn to just get the guest
  to start running reclaim?
 
 Isn't the knob in this proposal the balloon?  AFAICT, the idea here is 
 to change how the guest reacts to being ballooned, but the trigger 
 itself would not change.

I think the patch was made on the following assumptions:
1. Guests will keep filling their memory with relatively worthless page
   cache that they don't really need.
2. When they do this, it hurts the overall system with no real gain for
   anyone.

In the case of a ballooned guest, they _won't_ keep filling memory.  The
balloon will prevent them.  So, I guess I was just going down the path
of considering if this would be useful without ballooning in place.  To
me, it's really hard to justify _with_ ballooning in place.

 My issue is that changing the type of object being preferentially 
 reclaimed just changes the type of workload that would prematurely 
 suffer from reclaim.  In this case, workloads that use a lot of unmapped 
 pagecache would suffer.
 
 btw, aren't /proc/sys/vm/swapiness and vfs_cache_pressure similar knobs?

Those tell you how to balance going after the different classes of
things that we can reclaim.

Again, this is useless when ballooning is being used.  But, I'm thinking
of a more general mechanism to force the system to both have MemFree
_and_ be acting as if it is under memory pressure.

Balbir, can you elaborate a bit on why you would need these patches on a
guest that is being ballooned?

-- Dave

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Balbir Singh

* Dave Hansen d...@linux.vnet.ibm.com [2010-06-14 08:12:56]:

 On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
  1. A slab page will not be freed until the entire page is free (all
  slabs have been kfree'd so to speak). Normal reclaim will definitely
  free this page, but a lot of it depends on how frequently we are
  scanning the LRU list and when this page got added.
 
 You don't have to be freeing entire slab pages for the reclaim to have
 been useful.  You could just be making space so that _future_
 allocations fill in the slab holes you just created.  You may not be
 freeing pages, but you're reducing future system pressure.
 
 If unmapped page cache is the easiest thing to evict, then it should be
 the first thing that goes when a balloon request comes in, which is the
 case this patch is trying to handle.  If it isn't the easiest thing to
 evict, then we _shouldn't_ evict it.


Like I said earlier, a lot of that works correctly as you said, but it
is also an idealization. If you've got duplicate pages and you know
that they are duplicated and can be retrieved at a lower cost, why
wouldn't we go after them first?

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Dave Hansen

On Mon, 2010-06-14 at 22:28 +0530, Balbir Singh wrote:
 If you've got duplicate pages and you know
 that they are duplicated and can be retrieved at a lower cost, why
 wouldn't we go after them first?

I agree with this in theory.  But, the guest lacks the information about
what is truly duplicated and what the costs are for itself and/or the
host to recreate it.  Unmapped page cache may be the best proxy that
we have at the moment for easy to recreate, but I think it's still too
poor a match to make these patches useful.

-- Dave

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Balbir Singh

* Dave Hansen d...@linux.vnet.ibm.com [2010-06-14 10:09:31]:

 On Mon, 2010-06-14 at 22:28 +0530, Balbir Singh wrote:
  If you've got duplicate pages and you know
  that they are duplicated and can be retrieved at a lower cost, why
  wouldn't we go after them first?
 
 I agree with this in theory.  But, the guest lacks the information about
 what is truly duplicated and what the costs are for itself and/or the
 host to recreate it.  Unmapped page cache may be the best proxy that
 we have at the moment for easy to recreate, but I think it's still too
 poor a match to make these patches useful.


That is why the policy (in the next set) will come from the host. As
to whether the data is truly duplicated, my experiments show up to 60%
of the page cache is duplicated. The first patch today is again
enabled by the host. Both of them are expected to be useful in the
cache != none case.

The data I have shows more details including the performance and
overhead.

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Avi Kivity


On 06/14/2010 06:55 PM, Dave Hansen wrote:

On Mon, 2010-06-14 at 18:44 +0300, Avi Kivity wrote:
   

On 06/14/2010 06:33 PM, Dave Hansen wrote:
 

At the same time, I see what you're trying to do with this.  It really
can be an alternative to ballooning if we do it right, since ballooning
would probably evict similar pages.  Although it would only work in idle
guests, what about a knob that the host can turn to just get the guest
to start running reclaim?
   

Isn't the knob in this proposal the balloon?  AFAICT, the idea here is
to change how the guest reacts to being ballooned, but the trigger
itself would not change.
 

I think the patch was made on the following assumptions:
1. Guests will keep filling their memory with relatively worthless page
cache that they don't really need.
2. When they do this, it hurts the overall system with no real gain for
anyone.

In the case of a ballooned guest, they _won't_ keep filling memory.  The
balloon will prevent them.  So, I guess I was just going down the path
of considering if this would be useful without ballooning in place.  To
me, it's really hard to justify _with_ ballooning in place.
   


There are two decisions that need to be made:

- how much memory a guest should be given
- given some guest memory, what's the best use for it

The first question can perhaps be answered by looking at guest I/O rates 
and giving more memory to more active guests.  The second question is 
hard, but not any different than running non-virtualized - except if we 
can detect sharing or duplication.  In this case, dropping a duplicated 
page is worthwhile, while dropping a shared page provides no benefit.


How the patch helps answer either question, I'm not sure.  I don't think 
preferential dropping of unmapped page cache is the answer.



My issue is that changing the type of object being preferentially
reclaimed just changes the type of workload that would prematurely
suffer from reclaim.  In this case, workloads that use a lot of unmapped
pagecache would suffer.

btw, aren't /proc/sys/vm/swapiness and vfs_cache_pressure similar knobs?
 

Those tell you how to balance going after the different classes of
things that we can reclaim.

Again, this is useless when ballooning is being used.  But, I'm thinking
of a more general mechanism to force the system to both have MemFree
_and_ be acting as if it is under memory pressure.
   


If there is no memory pressure on the host, there is no reason for the 
guest to pretend it is under pressure.  If there is memory pressure on 
the host, it should share the pain among its guests by applying the 
balloon.  So I don't think voluntarily dropping cache is a good direction.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Balbir Singh

* Avi Kivity a...@redhat.com [2010-06-14 18:34:58]:

 On 06/14/2010 06:12 PM, Dave Hansen wrote:
 On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
 1. A slab page will not be freed until the entire page is free (all
 slabs have been kfree'd so to speak). Normal reclaim will definitely
 free this page, but a lot of it depends on how frequently we are
 scanning the LRU list and when this page got added.
 You don't have to be freeing entire slab pages for the reclaim to have
 been useful.  You could just be making space so that _future_
 allocations fill in the slab holes you just created.  You may not be
 freeing pages, but you're reducing future system pressure.
 
 Depends.  If you've evicted something that will be referenced soon,
 you're increasing system pressure.


I don't think slab pages care about being referenced soon, they are
either allocated or freed. A page is just a storage unit for the data
structure. A new one can be allocated on demand.
 
 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Balbir Singh

* Avi Kivity a...@redhat.com [2010-06-14 19:34:00]:

 On 06/14/2010 06:55 PM, Dave Hansen wrote:
 On Mon, 2010-06-14 at 18:44 +0300, Avi Kivity wrote:
 On 06/14/2010 06:33 PM, Dave Hansen wrote:
 At the same time, I see what you're trying to do with this.  It really
 can be an alternative to ballooning if we do it right, since ballooning
 would probably evict similar pages.  Although it would only work in idle
 guests, what about a knob that the host can turn to just get the guest
 to start running reclaim?
 Isn't the knob in this proposal the balloon?  AFAICT, the idea here is
 to change how the guest reacts to being ballooned, but the trigger
 itself would not change.
 I think the patch was made on the following assumptions:
 1. Guests will keep filling their memory with relatively worthless page
 cache that they don't really need.
 2. When they do this, it hurts the overall system with no real gain for
 anyone.
 
 In the case of a ballooned guest, they _won't_ keep filling memory.  The
 balloon will prevent them.  So, I guess I was just going down the path
 of considering if this would be useful without ballooning in place.  To
 me, it's really hard to justify _with_ ballooning in place.
 
 There are two decisions that need to be made:
 
 - how much memory a guest should be given
 - given some guest memory, what's the best use for it
 
 The first question can perhaps be answered by looking at guest I/O
 rates and giving more memory to more active guests.  The second
 question is hard, but not any different than running non-virtualized
 - except if we can detect sharing or duplication.  In this case,
 dropping a duplicated page is worthwhile, while dropping a shared
 page provides no benefit.

I think there is another way of looking at it, give some free memory

1. Can the guest run more applications or run faster
2. Can the host potentially get this memory via ballooning or some
other means to start newer guest instances

I think the answer to 1 and 2 is yes.

 
 How the patch helps answer either question, I'm not sure.  I don't
 think preferential dropping of unmapped page cache is the answer.


Preferential dropping as selected by the host, that knows about the
setup and if there is duplication involved. While we use the term
preferential dropping, remember it is still via LRU and we don't
always succeed. It is a best effort (if you can and the unmapped pages
are not highly referenced) scenario.
 
 My issue is that changing the type of object being preferentially
 reclaimed just changes the type of workload that would prematurely
 suffer from reclaim.  In this case, workloads that use a lot of unmapped
 pagecache would suffer.
 
 btw, aren't /proc/sys/vm/swapiness and vfs_cache_pressure similar knobs?
 Those tell you how to balance going after the different classes of
 things that we can reclaim.
 
 Again, this is useless when ballooning is being used.  But, I'm thinking
 of a more general mechanism to force the system to both have MemFree
 _and_ be acting as if it is under memory pressure.
 
 If there is no memory pressure on the host, there is no reason for
 the guest to pretend it is under pressure.  If there is memory
 pressure on the host, it should share the pain among its guests by
 applying the balloon.  So I don't think voluntarily dropping cache
 is a good direction.


There are two situations

1. Voluntarily drop cache, if it was setup to do so (the host knows
that it caches that information anyway)
2. Drop the cache on either a special balloon option, again the host
knows it caches that very same information, so it prefers to free that
up first. 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-14 Thread Dave Hansen

On Mon, 2010-06-14 at 19:34 +0300, Avi Kivity wrote:
  Again, this is useless when ballooning is being used.  But, I'm thinking
  of a more general mechanism to force the system to both have MemFree
  _and_ be acting as if it is under memory pressure.
 
 
 If there is no memory pressure on the host, there is no reason for the 
 guest to pretend it is under pressure.

I can think of quite a few places where this would be beneficial.

Ballooning is dangerous.  I've OOMed quite a few guests by
over-ballooning them.  Anything that's voluntary like this is safer than
things imposed by the host, although you do trade of effectiveness.

If all the guests do this, then it leaves that much more free memory on
the host, which can be used flexibly for extra host page cache, new
guests, etc...  A system in this state where everyone is proactively
keeping their footprints down is more likely to be able to handle load
spikes.  Reclaim is an expensive, costly activity, and this ensures that
we don't have to do that when we're busy doing other things like
handling load spikes.  This was one of the concepts behind CMM2: reduce
the overhead during peak periods.

It's also handy for planning.  Guests exhibiting this behavior will
_act_ as if they're under pressure.  That's a good thing to approximate
how a guest will act when it _is_ under pressure.

 If there is memory pressure on 
 the host, it should share the pain among its guests by applying the 
 balloon.  So I don't think voluntarily dropping cache is a good direction.

I think we're trying to consider things slightly outside of ballooning
at this point.  If ballooning was the end-all solution, I'm fairly sure
Balbir wouldn't be looking at this stuff.  Just trying to keep options
open. :)

-- Dave

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-11 Thread Balbir Singh

* KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com [2010-06-11 14:05:53]:

 On Fri, 11 Jun 2010 10:16:32 +0530
 Balbir Singh bal...@linux.vnet.ibm.com wrote:
 
  * KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com [2010-06-11 10:54:41]:
  
   On Thu, 10 Jun 2010 17:07:32 -0700
   Dave Hansen d...@linux.vnet.ibm.com wrote:
   
On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
  I'm not sure victimizing unmapped cache pages is a good idea.
  Shouldn't page selection use the LRU for recency information instead
  of the cost of guest reclaim?  Dropping a frequently used unmapped
  cache page can be more expensive than dropping an unused text page
  that was loaded as part of some executable's initialization and
  forgotten.
 
 We victimize the unmapped cache only if it is unused (in LRU order).
 We don't force the issue too much. We also have free slab cache to go
 after.

Just to be clear, let's say we have a mapped page (say of /sbin/init)
that's been unreferenced since _just_ after the system booted.  We also
have an unmapped page cache page of a file often used at runtime, say
one from /etc/resolv.conf or /etc/passwd.

   
   Hmm. I'm not fan of estimating working set size by calculation
   based on some numbers without considering history or feedback.
   
   Can't we use some kind of feedback algorithm as hi-low-watermark, random 
   walk
   or GA (or somehing more smart) to detect the size ?
  
  
  Could you please clarify at what level you are suggesting size
  detection? I assume it is outside the OS, right? 
  
 OS includes kernel and system programs ;)
 
 I can think of both way in kernel and in user approarh and they should be
 complement to each other.
 
 An example of kernel-based approach is.
  1. add a shrinker callback(A) for balloon-driver-for-guest as guest kswapd.
  2. add a shrinker callback(B) for balloon-driver-for-host as host kswapd.
 (I guess current balloon driver is only for host. Please imagine.)
 
 (A) increases free memory in Guest.
 (B) increases free memory in Host.
 
 This is an example of feedback based memory resizing between host and guest.
 
 I think (B) is necessary at least before considering complecated things.

B is left to the hypervisor and the memory policy running on it. My
patches address Linux running as a guest, with a Linux hypervisor at
the moment, but that can be extended to other balloon drivers as well.

 
 To implement something clever,  (A) and (B) should take into account that
 how frequently memory reclaim in guest (which requires some I/O) happens.
 

Yes, I think the policy in the hypervisor needs to look at those
details as well.

 If doing outside kernel, I think using memcg is better than depends on
 balloon driver. But co-operative balloon and memcg may show us something
 good.
 

Yes, agreed. Co-operative is better, if there is no co-operation than
memcg might be used for enforcement.

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-10 Thread Avi Kivity


On 06/08/2010 06:51 PM, Balbir Singh wrote:

Balloon unmapped page cache pages first

From: Balbir Singhbal...@linux.vnet.ibm.com

This patch builds on the ballooning infrastructure by ballooning unmapped
page cache pages first. It looks for low hanging fruit first and tries
to reclaim clean unmapped pages first.
   


I'm not sure victimizing unmapped cache pages is a good idea.  Shouldn't 
page selection use the LRU for recency information instead of the cost 
of guest reclaim?  Dropping a frequently used unmapped cache page can be 
more expensive than dropping an unused text page that was loaded as part 
of some executable's initialization and forgotten.


Many workloads have many unmapped cache pages, for example static web 
serving and the all-important kernel build.



The key advantage was that it resulted in lesser RSS usage in the host and
more cached usage, indicating that the caching had been pushed towards
the host. The guest cached memory usage was lower and free memory in
the guest was also higher.
   


Caching in the host is only helpful if the cache can be shared, 
otherwise it's better to cache in the guest.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-10 Thread Balbir Singh

* Avi Kivity a...@redhat.com [2010-06-10 12:43:11]:

 On 06/08/2010 06:51 PM, Balbir Singh wrote:
 Balloon unmapped page cache pages first
 
 From: Balbir Singhbal...@linux.vnet.ibm.com
 
 This patch builds on the ballooning infrastructure by ballooning unmapped
 page cache pages first. It looks for low hanging fruit first and tries
 to reclaim clean unmapped pages first.
 
 I'm not sure victimizing unmapped cache pages is a good idea.
 Shouldn't page selection use the LRU for recency information instead
 of the cost of guest reclaim?  Dropping a frequently used unmapped
 cache page can be more expensive than dropping an unused text page
 that was loaded as part of some executable's initialization and
 forgotten.


We victimize the unmapped cache only if it is unused (in LRU order).
We don't force the issue too much. We also have free slab cache to go
after.

 Many workloads have many unmapped cache pages, for example static
 web serving and the all-important kernel build.
 

I've tested kernbench, you can see the results in the original posting
and there is no observable overhead as a result of the patch in my
run.

 The key advantage was that it resulted in lesser RSS usage in the host and
 more cached usage, indicating that the caching had been pushed towards
 the host. The guest cached memory usage was lower and free memory in
 the guest was also higher.
 
 Caching in the host is only helpful if the cache can be shared,
 otherwise it's better to cache in the guest.


Hmm.. so we would need a ballon cache hint from the monitor, so that
it is not unconditional? Overall my results show the following

1. No drastic reduction of guest unmapped cache, just sufficient to
show lesser RSS in the host. More freeable memory (as in cached
memory + free memory) visible on the host.
2. No significant impact on the benchmark (numbers) running in the
guest.

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-10 Thread Dave Hansen

On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
  I'm not sure victimizing unmapped cache pages is a good idea.
  Shouldn't page selection use the LRU for recency information instead
  of the cost of guest reclaim?  Dropping a frequently used unmapped
  cache page can be more expensive than dropping an unused text page
  that was loaded as part of some executable's initialization and
  forgotten.
 
 We victimize the unmapped cache only if it is unused (in LRU order).
 We don't force the issue too much. We also have free slab cache to go
 after.

Just to be clear, let's say we have a mapped page (say of /sbin/init)
that's been unreferenced since _just_ after the system booted.  We also
have an unmapped page cache page of a file often used at runtime, say
one from /etc/resolv.conf or /etc/passwd.

Which page will be preferred for eviction with this patch set?

-- Dave

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-10 Thread KAMEZAWA Hiroyuki

On Thu, 10 Jun 2010 17:07:32 -0700
Dave Hansen d...@linux.vnet.ibm.com wrote:

 On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
   I'm not sure victimizing unmapped cache pages is a good idea.
   Shouldn't page selection use the LRU for recency information instead
   of the cost of guest reclaim?  Dropping a frequently used unmapped
   cache page can be more expensive than dropping an unused text page
   that was loaded as part of some executable's initialization and
   forgotten.
  
  We victimize the unmapped cache only if it is unused (in LRU order).
  We don't force the issue too much. We also have free slab cache to go
  after.
 
 Just to be clear, let's say we have a mapped page (say of /sbin/init)
 that's been unreferenced since _just_ after the system booted.  We also
 have an unmapped page cache page of a file often used at runtime, say
 one from /etc/resolv.conf or /etc/passwd.
 

Hmm. I'm not fan of estimating working set size by calculation
based on some numbers without considering history or feedback.

Can't we use some kind of feedback algorithm as hi-low-watermark, random walk
or GA (or somehing more smart) to detect the size ?

Thanks,
-Kame




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-10 Thread Balbir Singh

* KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com [2010-06-11 10:54:41]:

 On Thu, 10 Jun 2010 17:07:32 -0700
 Dave Hansen d...@linux.vnet.ibm.com wrote:
 
  On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
I'm not sure victimizing unmapped cache pages is a good idea.
Shouldn't page selection use the LRU for recency information instead
of the cost of guest reclaim?  Dropping a frequently used unmapped
cache page can be more expensive than dropping an unused text page
that was loaded as part of some executable's initialization and
forgotten.
   
   We victimize the unmapped cache only if it is unused (in LRU order).
   We don't force the issue too much. We also have free slab cache to go
   after.
  
  Just to be clear, let's say we have a mapped page (say of /sbin/init)
  that's been unreferenced since _just_ after the system booted.  We also
  have an unmapped page cache page of a file often used at runtime, say
  one from /etc/resolv.conf or /etc/passwd.
  
 
 Hmm. I'm not fan of estimating working set size by calculation
 based on some numbers without considering history or feedback.
 
 Can't we use some kind of feedback algorithm as hi-low-watermark, random walk
 or GA (or somehing more smart) to detect the size ?


Could you please clarify at what level you are suggesting size
detection? I assume it is outside the OS, right? 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-10 Thread Balbir Singh

* Dave Hansen d...@linux.vnet.ibm.com [2010-06-10 17:07:32]:

 On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
   I'm not sure victimizing unmapped cache pages is a good idea.
   Shouldn't page selection use the LRU for recency information instead
   of the cost of guest reclaim?  Dropping a frequently used unmapped
   cache page can be more expensive than dropping an unused text page
   that was loaded as part of some executable's initialization and
   forgotten.
  
  We victimize the unmapped cache only if it is unused (in LRU order).
  We don't force the issue too much. We also have free slab cache to go
  after.
 
 Just to be clear, let's say we have a mapped page (say of /sbin/init)
 that's been unreferenced since _just_ after the system booted.  We also
 have an unmapped page cache page of a file often used at runtime, say
 one from /etc/resolv.conf or /etc/passwd.
 
 Which page will be preferred for eviction with this patch set?


In this case the order is as follows

1. First we pick free pages if any
2. If we don't have free pages, we go after unmapped page cache and
slab cache
3. If that fails as well, we go after regularly memory

In the scenario that you describe, we'll not be able to easily free up
the frequently referenced page from /etc/*. The code will move on to
step 3 and do its regular reclaim. 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-10 Thread KAMEZAWA Hiroyuki

On Fri, 11 Jun 2010 10:16:32 +0530
Balbir Singh bal...@linux.vnet.ibm.com wrote:

 * KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com [2010-06-11 10:54:41]:
 
  On Thu, 10 Jun 2010 17:07:32 -0700
  Dave Hansen d...@linux.vnet.ibm.com wrote:
  
   On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
 I'm not sure victimizing unmapped cache pages is a good idea.
 Shouldn't page selection use the LRU for recency information instead
 of the cost of guest reclaim?  Dropping a frequently used unmapped
 cache page can be more expensive than dropping an unused text page
 that was loaded as part of some executable's initialization and
 forgotten.

We victimize the unmapped cache only if it is unused (in LRU order).
We don't force the issue too much. We also have free slab cache to go
after.
   
   Just to be clear, let's say we have a mapped page (say of /sbin/init)
   that's been unreferenced since _just_ after the system booted.  We also
   have an unmapped page cache page of a file often used at runtime, say
   one from /etc/resolv.conf or /etc/passwd.
   
  
  Hmm. I'm not fan of estimating working set size by calculation
  based on some numbers without considering history or feedback.
  
  Can't we use some kind of feedback algorithm as hi-low-watermark, random 
  walk
  or GA (or somehing more smart) to detect the size ?
 
 
 Could you please clarify at what level you are suggesting size
 detection? I assume it is outside the OS, right? 
 
OS includes kernel and system programs ;)

I can think of both way in kernel and in user approarh and they should be
complement to each other.

An example of kernel-based approach is.
 1. add a shrinker callback(A) for balloon-driver-for-guest as guest kswapd.
 2. add a shrinker callback(B) for balloon-driver-for-host as host kswapd.
(I guess current balloon driver is only for host. Please imagine.)

(A) increases free memory in Guest.
(B) increases free memory in Host.

This is an example of feedback based memory resizing between host and guest.

I think (B) is necessary at least before considering complecated things.

To implement something clever,  (A) and (B) should take into account that
how frequently memory reclaim in guest (which requires some I/O) happens.

If doing outside kernel, I think using memcg is better than depends on
balloon driver. But co-operative balloon and memcg may show us something
good.

Thanks,
-Kame


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-10 Thread KAMEZAWA Hiroyuki

On Fri, 11 Jun 2010 14:05:53 +0900
KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote:

 I can think of both way in kernel and in user approarh and they should be
 complement to each other.
 
 An example of kernel-based approach is.
  1. add a shrinker callback(A) for balloon-driver-for-guest as guest kswapd.
  2. add a shrinker callback(B) for balloon-driver-for-host as host kswapd.
 (I guess current balloon driver is only for host. Please imagine.)
  
  guest.
Sorry.
-Kame

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

38 matches

Mail list logo