On Mon, 2008-09-01 at 15:45 +1000, David Gibson wrote: > On Fri, Aug 29, 2008 at 09:51:35AM -0500, Adam Litke wrote: > > On Fri, 2008-08-29 at 15:42 +1000, David Gibson wrote: > > > On Wed, Aug 27, 2008 at 06:34:41PM +0000, Adam Litke wrote: > [snip] > > > Hrm. Something about the structure of all this bothers me, but I'm > > > going to have to think some more on how I think it should be done. It > > > seems to me like this draft has too much of a dichotomy between the > > > default / non-default pagesize. > > > > > > I'd envisage instead, something where the available mountpoints and > > > pagesizes can be queried. The functions for explicitly allocating > > > hugepages (unlinked_fd() and so forth) would have new versions which > > > take an explicit pagesize / mountpoint (not sure which). Obviously > > > the ones that just use a default pagesize would be kept too, for > > > compatibility but they'd just be a wrapper around the more general > > > version. Possibly a function to change the default pagesize (from > > > amongst the available ones) at runtime too. Like I say, need to > > > think about this some more. > > > > All of the features you suggest can be easily added by an already > > planned follow-on patch series. For example, the default size could be > > changed through specification of an environment variable. Adding the > > explicit page size selection functions (for unlinked_fd, et al) is also > > trivial. > > Ok, I'm glad to hear that. > > > The complexity you refer to as "a dichotomy between the default / > > non-default pagesize" has been specifically designed. To be compatible > > with older kernels (those with only one page size) the default page size > > must be handled in a compatibility mode. For example, the counters need > > to be read from /proc/meminfo because they may not be available in > > sysfs. Also, to preserve compatibility with any applications that > > aren't accustomed to specifying a page size, we must ensure that the > > default page size in libhugetlbfs is also the kernel default size. > > Unfortunately choosing the default size isn't as simple as querying > > meminfo for the system, default page size. If the user hasn't mounted a > > filesystem with a size matching the meminfo size, we must choose a > > default arbitrarily. > > Oh, certainly, I have no problem with the selection of a default size > and mountpoint in this manner. Clearly we need to do this for > compatibility. It just seems like there places that aren't locked in > by compatibility where the fundamental option seems to be > default/non-default rather than pagesize directly which would seem to > make more sense. > > > Some of this "default size" stuff will get worked out of the code more > > as I add functions for requesting specific sizes ie. > > hugetlbfs_unlinked_fd(). The current plan is to have a separate > > Ok, that sounds good. > > > library/executable (akin to hugectl) that will be used to query page > > sizes, mount points, and pool counter values. Since we have adopted the > > Ok, but presumably there will also be library interface so that > programs can query this for themself?
Yes, that's the plan. > > limitation that only one mountpoint can exist per page size, I feel it > > is much more user-friendly to use the page size as a handle rather than > > the mount point. The page size is more important to the user than a > > filesystem mount point that we are actually trying to abstract away from > > them. > > Yes, that's true in general. But are there cases where multiple > mountpoints of the same page size aren't interchangable? The > mountpoints will have separate quota allocation, won't they? And they > could have different permissions. The only case I can imagine for needing two mountpoints of the same size in use by the same library at the same time is to accommodate a picky user who wants to account for morecore hugepages and ELF segment hugepages using different quotas for each type. While it's true that we could support such usage, I'm not sure it's worth the development effort at this stage. If (farther down the road) we do decide we need this type of thing, we could add an interface for using mount handles akin to the following: /* Get a handle to a hugetlbfs mount point */ hugetlbfs_mount_t *get_hugetlbfs(const char *path); /* Create an fd on a specific hugetlbfs mount point */ int hugetlbfs_unlinked_fd_from_handle(hugetlbfs_mount_t *handle); I think bringing this level of complexity to the environment variables (HUGETLB_MORECORE, HUGETLB_ELFMAP, etc) is not worth it given such a narrow and hypothetical use case. > > Your questions reflect the fact that I have neglected to detail my plan > > to complete the multiple page size support. Hopefully my responses > > above have helped in that regard. I feel as if we are basically on the > > same page design-wise. I have been thinking about this interface since > > June and I am confident that it is fundamentally designed to solve the > > intricacies of multiple page sizes _and_ backwards compatibility as > > logically and as simply as is possible. I have no doubt that > > improvements are possible to my implementation, but I don't think a > > redesign is necessary. > > Ok. > -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Libhugetlbfs-devel mailing list Libhugetlbfs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel