Re: [Libhugetlbfs-devel] Improved exorcism

Nishanth Aravamudan Tue, 31 Oct 2006 10:03:05 -0800

On 31.10.2006 [13:16:34 +1100], David Gibson wrote:
> On Mon, Oct 30, 2006 at 01:45:41PM -0800, Nishanth Aravamudan wrote:
> > On 30.10.2006 [15:56:43 +1100], David Gibson wrote:
> > > Latest daemon removal path.  Updated to apply clean on top of the
> > > latest changes in the git tree, otherwise nothing new.
> > > 
> > > Now that 1.0.1 is out, are we ready to merge this?  Remember that we
> > > can make a git branch to hold strictly-maintenance revisions of
> > > 1.0/1.0.1 for Red Hat, if necessary.
> > 
> > <snip>
> > 
> > Mostly nits below...
> > 
> > > +Now, the same application run twice with the same environment
> > > variable +settings will use the same file both times (reducing the
> > > overall +hugetlbfs file usage). If, for any reason, segment
> > > sharing is +requested but unavailable, the library will fallback
> > > to unlinked +files.
> > 
> > Since there is no longer automatic removal of files in the hugetlbfs
> > mount-point after some time (as there was with hugetlbd), this
> > should be documented. And perhaps the bash script, or whatever you
> > would like to have as a control mechanism should also be merged with
> > the patch in its final form.
> 
> Ok, some notes about this added.  Actually, while I was at it, I
> rewrote the whole section on sharing, to be clearer.


Excellent.

<snip>

> > So you can just return -1 here or something... Why are you aborting by
> > the way? Shouldn't it just fail to share? [1]
> 
> No.  It's an abort() for simplicity - same rationale as xmalloc().
> Path overflows should *really* only occur in pathological situations:
> indeed, the most likely is someone attempting to trigger a buffer
> overflow.  Therefore, avoiding the need for checks and backout code in
> all the callers is a good tradeoff.

Fair enough. Probably should abort() in the case that vsnprintf() has
some error, too, then?

> > > +static int check_env(void)
> > > +{
> > 
> > <snip>
> > 
> > > +}
> > 
> > This whole function move & change should be submitted as a separate
> > patch, it has little to do with removing the daemon and makes this diff
> > larger than necessary. But it totally makes sense to merge :)
> 
> Yeah, I think I had a reason to move it (function ordering or
> something) in an interim version.  Move is undone.
> 
> However, the change (adding testing of HUGELTB_SHARE) remains.  The
> path in which the test of HUGETLB_SHARE was before has been
> re-arranged so much anyway, this is still the most sensible place for
> it.

Yep.

<snip>

> > > + ret = mkdir(share_path, 0700);
> > > + if ((ret != 0) && (errno != EEXIST)) {
> > > +         ERROR("Error creating share directory %s", share_path);
> > > +         return -1;
> > > + }
> > > +
> > > + /* Check the share directory is sane */
> > > + ret = lstat(share_path, &sb);
> > > + if (ret != 0) {
> > > +         ERROR("Couldn't stat() %s: %s\n", share_path, strerror(errno));
> > > +         return -1;
> > > + }
> > > +
> > > + if (! S_ISDIR(sb.st_mode)) {
> > 
> > I think we've been avoiding the extra space after the '!'.
> 
> Hrm.. both forms exist in elflink.c already, neither is clearly
> dominant.  I really prefer the space here, because right between the (
> and capital letter it's very easy to overlook the !.

Fair enough, but I spent a bit of time before trying to make this
somewhat consistent -- and, at some point, we had decided to try and
stick to kernel CodingStyle as close as possible. Certainly not enough
to block inclusion, though.

> > Does this check make sense given that you're using lstat() above? If
> > the HUGETLB_SHARE_PATH is a symbolic link, then lstat() is going to
> > stat the link, not the dir it points to, and then we'll fail here...
> 
> Well, first, HUGETLB_SHARE_PATH is not set in this code path.

D'oh! Yes, sorry, misread.

> Second, the test is intentional: it ensures that the path is not a
> symbolic link, since I can see little reason to make it so, and it
> very simply prevents any security silly buggers from dropping in
> symlinks here.  Now, if the current uid is the owner of the link, the
> latter is probably not an issue, but I'd need more time to convince
> myself.

This is to prevent the case, then, that we created the directory for the
shared files, and then someone comes in and does

rmdir $our_dir
ln -sf $mordor $our_dir

?

Just to make sure I understand.

<snip>

> > > + if (sb.st_mode & (S_IWGRP | S_IWOTH)) {
> > > +         ERROR("%s has bad permissions 0%03o\n",
> > > +               share_path, sb.st_mode);
> > > +         return -1;
> > > + }
> > 
> > But wait, this doesn't make complete sense -- you're only doing all
> > this sanity checking if we mkdir(). Shouldn't we also do it in the
> > case of a custom path?
> 
> No.  Part of the point of the custom path is that you can use it to
> construct your own security model instead of the default.  If you set
> HUGETLB_SHARE_PATH, it's assumed you know what you're doing w.r.t.
> permissions on the target directory.

Alright, we should make sure this is documented in the HOWTO, then.

> > And you're not backing out changes to the system correctly. After
> > the mkdir() has succeeded, if something else fails, we should
> > rmdir() it.
> 
> I don't think so.  The only things that can fail here are:
>       - mkdir() fails, in which case we never created it so there's no
>       need to rmdir().
>       - the permissions tests fail because someone else created it -
>       if someone else has already created it, we shouldn't go
>       rmdir()ing it (and may not have permission to, anyway)
>       - mkdir() succeeds, but the lstat() fails.  This will never
>       happen unless things are in a *very* strange state indeed.  Best
>       not mess with it any further, and leave everything in place for
>       diagnosis.

That makes sense now, thanks. I suppose, technically, all of these other
checks could go in the else clause below, after the mkdir()?

if (ret != 0) {
        if (errno != EEXIST) {
                ERROR();
                return -1;
        } else {
                ...
        }
}

That would make it clearer why they are necessary (to me, at least). And
they shouldn't be necessary in the case where the mkdir() simply
succeeds, right?

<snip>

> > > +/**
> > > + * find_or_prepare_shared_file - get one shareable file
> > > + * @htlb_seg_info: pointer to program's segment data
> > 
> > This seems rather insufficient for a description of the core logic.
> 
> Yeah, added some more here.

Excellent, thanks.

<snip>

> > > + ret = get_shared_file_name(htlb_seg_info, final_path);
> > > + if (ret < 0)
> > > +         return -1;
> > > + assemble_path(tmp_path, "%s.tmp", final_path);
> > > +
> > > + do {
> > 
> > This really needs more commenting, I think. Maybe a little paragraph at
> > the top like:
> 
> Or, for example, the comment I put in the mail I sent with the first
> version of the patch containing this logic.  Revised and added as a
> comment.

Heh, yes, that works too :)

> > > +         /* NB: mode is modified by umask */
> > > +         fdx = open(tmp_path, O_CREAT | O_EXCL | O_RDWR, 0666);
> > > +         errnox = errno;
> > > +         fds = open(final_path, O_RDONLY);
> > > +         errnos = errno;
> > > +
> > > +         if (fds > 0) {
> > 
> > Stupid nit, but >= technically, right?
> 
> Ah, yes.  Just in case someone closes stdin, then exec()s a
> libhugetlbfs program.

Yep, unlikely, but I guess anything is possible and we shouldn't fail in
that case. Same goes for the other fd checks, of course.

> > > +         /* Both opens failed, somebody else is still preparing */
> > > +         /* Wait and try again */
> > > +         sleep(1);
> > > +         /* FIXME: should have a timeout */
> > 
> > Should be easy enough to add, no?
> 
> Fairly, yes, just that time arithmetic is usually a minor pain.  Since
> this can only happen if something else has gone wrong, I think it can
> wait until a later patch to add.  Indeed for now, spinning here might
> actually for debugging other problems, if they arise.

Agreed on the time arithmetic problems. Probably could just install a
temprorary SIGALRM handler and call alarm() before the loop? Then
reinstall the old ALRM handler, if there was one? That's basically what
I did in the daemon to handle timing out the read()s and write()s, and
it seemed to avoid heavy-handed time math pretty well.

I'm ok with deferring this bit, and you're right, it probably is good
for debugging. Would it make sense, though, to keep track of how many
times we loop, and every 10 or so, spit out at DEBUG()? That way we can
know if we're stuck in the infinite loop?

Great work on this, David, btw.

Thanks,
Nish

-- 
Nishanth Aravamudan <[EMAIL PROTECTED]>
IBM Linux Technology Center

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Libhugetlbfs-devel mailing list
Libhugetlbfs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel

Re: [Libhugetlbfs-devel] Improved exorcism

Reply via email to