On Sat, Feb 12, 2005 at 10:29:14PM +0100, Andi Kleen wrote:
> On Sat, Feb 12, 2005 at 01:54:26PM -0200, Marcelo Tosatti wrote:
> > On Sat, Feb 12, 2005 at 12:17:25PM +0100, Andi Kleen wrote:
> > > Ray Bryant <[EMAIL PROTECTED]> writes:
> > > > set of pages associated with a particular process need to be moved.
> > > > The kernel interface that we are proposing is the following:
> > > >
> > > > page_migrate(pid, va_start, va_end, count, old_nodes, new_nodes);
> > > 
> > > [Only commenting on the interface, haven't read your patches at all]
> > > 
> > > This is basically mbind() with MPOL_F_STRICT, except that it has a pid 
> > > argument. I assume that's for the benefit of your batch scheduler.
> > 
> > As far as I understand mbind() is used to set policies to given memory 
> > regions, not move memory regions?
> 
> There is a MPOL_F_STRICT flag. Currently it fails when the memory
> is not on the right node(s) and the flag is set, but it could as well move. 
> 
> In fact Steve Longerbeam already did a patch to move in this case,
> but it hasn't been merged yet for some reasons.
> 
> 
> > > mmap in parallel. The only way I can think of to do this would be to
> > > check for changes in maps after a full move and loop, but then you risk
> > > livelock.
> > 
> > True. 
> > 
> > There is no problem, however, if all threads beloging to the process are 
> > stopped, 
> > as Ray mentions. 
> > 
> > So, there wont be memory mapping changes happening at the same time. 
> 
> Ok. But it's still quite ugly to read /proc/*/maps for this.
> 
> > 
> > > And you cannot also just specify va_start=0, va_end=~0UL because that
> > > would make the node arrays grow infinitely. 
> > > 
> > > Also is there a good use case why the batch scheduler should only
> > > move individual areas in a process around, not the full process?
> > 
> > Quoting him:
> > 
> > "In addition to its use by batch schedulers, we also envision that
> > this facility could be used by a program to re-arrange the allocation
> > of its own pages on various nodes of the NUMA system, most likely
> > to optimize performance of the application during different phases
> > of its computation."
> > 
> > Seems doable. 
> 
> That is what mbind() already supports, just someone needs to hook up
> the page moving code with MPOL_F_STRICT.

But how do you use mbind() to change the memory placement for an anonymous
private mapping used by a vendor provided executable with mbind()?

Robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to