On Fri, 2008-08-29 at 15:42 +1000, David Gibson wrote:
> On Wed, Aug 27, 2008 at 06:34:41PM +0000, Adam Litke wrote:
> > An upcoming release of the Linux kernel will support simultaneous use of
> > multiple huge page sizes.  Each size will be accessed through its own
> > specially-mounted hugetlbfs filesystem.  The first step in enabling
> > libhugetlbfs to support multiple simultaneous page sizes is enabling the
> > support of multiple simultaneous hugetlbfs mount points.
> > 
> > This patch adds basic support for multiple mount points while preserving
> > backwards-compatibility.  Mount points can be added via the HUGETLB_PATH
> > environment variable which has been extended in the normal way to allow
> > multiple paths to be specified (using a colon separator).  Mounts will also 
> > be
> > discovered by reading /proc/mounts or /etc/mtab.  Up to 10 mount points are
> > allowed to co-exist but only one mount per page size is allowed.  If
> > HUGETLB_PATH is specified, only mount points listed in that variable will be
> > added.  Otherwise, paths in /proc/mounts or /etc/mtab will be added in 
> > order of
> > appearance.  The first mount point of a given size is used and subsequent
> > mounts of that page size are skipped.
> > 
> > For compatibility and ease of use, a default mount point is selected.  When
> > multiple mount points have been added, /proc/meminfo is read to determine 
> > the
> > system's default huge page size and the mount point having that size is
> > selected as the default.  If a mount point for the default page size cannot 
> > be
> > found, the first mount point found becomes the default.  The 
> > gethugepagesize()
> > call has been modified to return the default huge page size as determined 
> > the
> > method just described.
> 
> Hrm.  Something about the structure of all this bothers me, but I'm
> going to have to think some more on how I think it should be done.  It
> seems to me like this draft has too much of a dichotomy between the
> default / non-default pagesize.
> 
> I'd envisage instead, something where the available mountpoints and
> pagesizes can be queried.  The functions for explicitly allocating
> hugepages (unlinked_fd() and so forth) would have new versions which
> take an explicit pagesize / mountpoint (not sure which).  Obviously
> the ones that just use a default pagesize would be kept too, for
> compatibility but they'd just be a wrapper around the more general
> version.  Possibly a function to change the default pagesize (from
> amongst the available ones) at runtime too.  Like I say, need to
> think about this some more.

All of the features you suggest can be easily added by an already
planned follow-on patch series.  For example, the default size could be
changed through specification of an environment variable.  Adding the
explicit page size selection functions (for unlinked_fd, et al) is also
trivial.

The complexity you refer to as "a dichotomy between the default /
non-default pagesize" has been specifically designed.  To be compatible
with older kernels (those with only one page size) the default page size
must be handled in a compatibility mode.  For example, the counters need
to be read from /proc/meminfo because they may not be available in
sysfs.  Also, to preserve compatibility with any applications that
aren't accustomed to specifying a page size, we must ensure that the
default page size in libhugetlbfs is also the kernel default size.
Unfortunately choosing the default size isn't as simple as querying
meminfo for the system, default page size.  If the user hasn't mounted a
filesystem with a size matching the meminfo size, we must choose a
default arbitrarily.

Some of this "default size" stuff will get worked out of the code more
as I add functions for requesting specific sizes ie.
hugetlbfs_unlinked_fd().  The current plan is to have a separate
library/executable (akin to hugectl) that will be used to query page
sizes, mount points, and pool counter values.  Since we have adopted the
limitation that only one mountpoint can exist per page size, I feel it
is much more user-friendly to use the page size as a handle rather than
the mount point.  The page size is more important to the user than a
filesystem mount point that we are actually trying to abstract away from
them.

Your questions reflect the fact that I have neglected to detail my plan
to complete the multiple page size support.  Hopefully my responses
above have helped in that regard.  I feel as if we are basically on the
same page design-wise.  I have been thinking about this interface since
June and I am confident that it is fundamentally designed to solve the
intricacies of multiple page sizes _and_ backwards compatibility as
logically and as simply as is possible.  I have no doubt that
improvements are possible to my implementation, but I don't think a
redesign is necessary.

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Libhugetlbfs-devel mailing list
Libhugetlbfs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel

Reply via email to