On 2025-09-10 02:03, Konstantin Belousov wrote:
On Sun, Sep 07, 2025 at 09:25:59AM -0700, James Gritton wrote:
On 2025-09-06 17:26, Konstantin Belousov wrote:
> On Fri, Sep 05, 2025 at 10:57:30AM -0700, James Gritton wrote:
> > On 2025-09-04 22:14, Konstantin Belousov wrote:
> > > BTW, you added some support for kqueue for jail events, but not to the
> > > jail file descriptors.  This seems to be backward: if somebody wants to
> > > monitor events for jails, then it is more reliable and straightforward
> > > to do with the new jail fds rather than with ids.
> >
> > It is at least incomplete, and not the state I want things to be at.
> > There's a sticking point with jaildesc kqueue, so while I work that
> > out I went with jid-baseds kqueue as a starter.
> >
> > The trouble is child jails.  I took their handling from the existing
> > child process handling, where I register a new kevent under the new
> > jail's id.  But that's something I can't do with descriptors, since
> > they have a process-specific identifier, the descriptor number.  The
> > code that creates the new event, coming from the jail_set call that
> > created a new jail, has access to the global descriptor (the struct
> > file), but not to the process(es) that have it open, so I have no
> > way of registering one or more events with that descriptor number.
> >
> > One workaround is to have both jid- and jaildesc-based kevents, but
> > both of them register a new jid-based kevent for a newly created child
> > jail.  The caller may then get a descriptor with jail_get, and add a
> > kevent for it and remove the old jid-based one.  This would work, but
> > feels really klunky.
> >
> > The other idea I've had is to register a temporary event, and then add
> > code to kqueue_scan that converts that into a proper jaildesc event
> > with the expected file descriptor number.  That would require either
> > jaildesc-specific code in or around kqueue_scan, or adding another
> > filterops function, neither of which is great.  Still, it seems the
> > better solution.
>
> This is not how the monitoring APIs work in general.  For instance,
> when you register a listening socket in kqueue (or mark it for select or
> poll), you do not get back a new connected file descriptor.  Kqueue only
> provides a notification that new connection arrived, and then code
> needs to accept it and get the file descriptor for new connection using
> dedicated socket API.

True.  An accepted connection changes the network state, both locally
and remotely, and automatically establishing that connection wouldn't
be the right things to do.  The existence of a listen queue also fits
well with a notification system that doesn't do its own queueing.

Jail descriptors, on the other hand, only exist as a veiw to an
existing jail, and don't establish anything other than that view.
Jail creation also has no associated queue, so loss-free noficiation
relies on the same hack that process forking already established,
but requiring a little more in the way of making it fit.

An alternate way of solving the problem would be to create such
a queue, allowing a single notification of such things as a jail
attachment or child jail creation, or possibly more than one of
them by the time the process reads the queue.

...

First, since you already mentioned a desire to capsicumize jfds, I think it
is already a huge wart in the interface.  The function that opens (or
creates) fd from a jail id, must not take just jail.  It should be
namespace-aware already.  In other words, it should take existing jfd
and create a child jail, returning jfd for it.  The existing jfd gives
the namespace container to start with, which is essentially how capsicum
is organizing the rights limiting.

For the bootstrapping, the prison0 non-capentered process can pass a special id for jfd to reference prison0, similar how AT_FWCWD marks '.' for *at(2)
syscalls.

My desire to capsicumize jail descriptors is at https://reviews.freebsd.org/D52516 This isn't part of what I expect to get into 15, just something for Current.

I haven't done anything with the bootstrapping you mentioned. For non-capsicum use, it's just as good to have jid-based kevents on jail 0, which is currently supported. For capsicum use, the idea of jail_get(2) on the process' current jail, also by specifying jid 0, is something I'd like to explore, but it has security considerations we'd want to explore first. Certainly, only limited aspects of the current prison environment would be reported, and any resulting descriptor wouldn't include permission to modify that jail or to read anything
more about it.

- Jamie

Reply via email to