On Fri, 2008-08-29 at 15:42 +1000, David Gibson wrote: > On Wed, Aug 27, 2008 at 06:34:41PM +0000, Adam Litke wrote: > > An upcoming release of the Linux kernel will support simultaneous use of > > multiple huge page sizes. Each size will be accessed through its own > > specially-mounted hugetlbfs filesystem. The first step in enabling > > libhugetlbfs to support multiple simultaneous page sizes is enabling the > > support of multiple simultaneous hugetlbfs mount points. > > > > This patch adds basic support for multiple mount points while preserving > > backwards-compatibility. Mount points can be added via the HUGETLB_PATH > > environment variable which has been extended in the normal way to allow > > multiple paths to be specified (using a colon separator). Mounts will also > > be > > discovered by reading /proc/mounts or /etc/mtab. Up to 10 mount points are > > allowed to co-exist but only one mount per page size is allowed. If > > HUGETLB_PATH is specified, only mount points listed in that variable will be > > added. Otherwise, paths in /proc/mounts or /etc/mtab will be added in > > order of > > appearance. The first mount point of a given size is used and subsequent > > mounts of that page size are skipped. > > > > For compatibility and ease of use, a default mount point is selected. When > > multiple mount points have been added, /proc/meminfo is read to determine > > the > > system's default huge page size and the mount point having that size is > > selected as the default. If a mount point for the default page size cannot > > be > > found, the first mount point found becomes the default. The > > gethugepagesize() > > call has been modified to return the default huge page size as determined > > the > > method just described. > > Hrm. Something about the structure of all this bothers me, but I'm > going to have to think some more on how I think it should be done. It > seems to me like this draft has too much of a dichotomy between the > default / non-default pagesize. > > I'd envisage instead, something where the available mountpoints and > pagesizes can be queried. The functions for explicitly allocating > hugepages (unlinked_fd() and so forth) would have new versions which > take an explicit pagesize / mountpoint (not sure which). Obviously > the ones that just use a default pagesize would be kept too, for > compatibility but they'd just be a wrapper around the more general > version. Possibly a function to change the default pagesize (from > amongst the available ones) at runtime too. Like I say, need to > think about this some more.
All of the features you suggest can be easily added by an already planned follow-on patch series. For example, the default size could be changed through specification of an environment variable. Adding the explicit page size selection functions (for unlinked_fd, et al) is also trivial. The complexity you refer to as "a dichotomy between the default / non-default pagesize" has been specifically designed. To be compatible with older kernels (those with only one page size) the default page size must be handled in a compatibility mode. For example, the counters need to be read from /proc/meminfo because they may not be available in sysfs. Also, to preserve compatibility with any applications that aren't accustomed to specifying a page size, we must ensure that the default page size in libhugetlbfs is also the kernel default size. Unfortunately choosing the default size isn't as simple as querying meminfo for the system, default page size. If the user hasn't mounted a filesystem with a size matching the meminfo size, we must choose a default arbitrarily. Some of this "default size" stuff will get worked out of the code more as I add functions for requesting specific sizes ie. hugetlbfs_unlinked_fd(). The current plan is to have a separate library/executable (akin to hugectl) that will be used to query page sizes, mount points, and pool counter values. Since we have adopted the limitation that only one mountpoint can exist per page size, I feel it is much more user-friendly to use the page size as a handle rather than the mount point. The page size is more important to the user than a filesystem mount point that we are actually trying to abstract away from them. Your questions reflect the fact that I have neglected to detail my plan to complete the multiple page size support. Hopefully my responses above have helped in that regard. I feel as if we are basically on the same page design-wise. I have been thinking about this interface since June and I am confident that it is fundamentally designed to solve the intricacies of multiple page sizes _and_ backwards compatibility as logically and as simply as is possible. I have no doubt that improvements are possible to my implementation, but I don't think a redesign is necessary. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Libhugetlbfs-devel mailing list Libhugetlbfs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel