On Mon, Mar 09, 2026 at 09:50:18AM -0700, Andy Lutomirski wrote:
> On Mon, Mar 9, 2026 at 1:58 AM Christian Brauner <[email protected]> wrote:
> >
> > On Sun, Mar 08, 2026 at 10:10:05AM -0700, Andy Lutomirski wrote:
> > > On Sun, Mar 8, 2026 at 4:40 AM Jeff Layton <[email protected]> wrote:
> > > >
> > > > On Sat, 2026-03-07 at 10:56 -0800, Andy Lutomirski wrote:
> > > > > On Sat, Mar 7, 2026 at 6:09 AM Dorjoy Chowdhury
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > This flag indicates the path should be opened if it's a regular
> > > > > > file.
> > > > > > This is useful to write secure programs that want to avoid being
> > > > > > tricked into opening device nodes with special semantics while
> > > > > > thinking
> > > > > > they operate on regular files. This is a requested feature from the
> > > > > > uapi-group[1].
> > > > > >
> > > > >
> > > > > I think this needs a lot more clarification as to what "regular"
> > > > > means. If it's literally
> > > > >
> > > > > > A corresponding error code EFTYPE has been introduced. For example,
> > > > > > if
> > > > > > openat2 is called on path /dev/null with OPENAT2_REGULAR in the flag
> > > > > > param, it will return -EFTYPE. EFTYPE is already used in BSD systems
> > > > > > like FreeBSD, macOS.
> > > > >
> > > > > I think this needs more clarification as to what "regular" means,
> > > > > since S_IFREG may not be sufficient. The UAPI group page says:
> > > > >
> > > > > Use-Case: this would be very useful to write secure programs that want
> > > > > to avoid being tricked into opening device nodes with special
> > > > > semantics while thinking they operate on regular files. This is
> > > > > particularly relevant as many device nodes (or even FIFOs) come with
> > > > > blocking I/O (or even blocking open()!) by default, which is not
> > > > > expected from regular files backed by “fast” disk I/O. Consider
> > > > > implementation of a naive web browser which is pointed to
> > > > > file://dev/zero, not expecting an endless amount of data to read.
> > > > >
> > > > > What about procfs? What about sysfs? What about /proc/self/fd/17
> > > > > where that fd is a memfd? What about files backed by non-"fast" disk
> > > > > I/O like something on a flaky USB stick or a network mount or FUSE?
> > > > >
> > > > > Are we concerned about blocking open? (open blocks as a matter of
> > > > > course.) Are we concerned about open having strange side effects?
> > > > > Are we concerned about write having strange side effects? Are we
> > > > > concerned about cases where opening the file as root results in
> > > > > elevated privilege beyond merely gaining the ability to write to that
> > > > > specific path on an ordinary filesystem?
> >
> > I think this is opening up a barrage of question that I'm not sure are
> > all that useful. The ability to only open regular file isn't intended to
> > defend against hung FUSE or NFS servers or other random Linux
> > special-sauce murder-suicide file descriptor traps. For a lot of those
> > we have O_PATH which can easily function with the new extension. A lot
> > of the other special-sauce files (most anonymous inode fds) cannot even
> > be reopened via e.g., /proc.
>
> On the flip side, /proc itself can certainly be opened. Should
> O_REGULAR be able to open the more magical /proc and /sys files? Are
> there any that are problematic?
If procfs job isn't to provide problematic files to userspace I'm not
sure what it is. Joking aside, I think in general you are of course
right that procfs is full of files that under a very strict
interpretation of "regular file" should absolutely not count as a
regular file. sysfs probably as well and let's ignore debugfs and
tracefs and all the other magic filesystems or files.
In general, Linux has been so loosey-goosey with "regular file" for such
a long-time that making OPENAT2_REGULAR come up with some strict
definition of "this is a regular file - no really, pinky-promise a
regular one" - is just doomed to fail.
The other problem is that we cannot reasonably determine what odd file
the user really wanted to defend against opening with OPENAT2_REGULAR.
A caller may really want to open /proc/kmsg and just be sure that
someone didn't overmount it with a fifo (systemd does that in containers
iirc).
My personal "hot take" is that adding an api built around a regular file
with immediate irreversible side-effects for the caller on VFS
syscall-based open [1] is a bug. Such broken semantics is what ioctl()s
are for.
[1]: I mean specifically open(), openat2() etc. I'm excluding all
dedicated APIs that return file descriptors that cannot be reopened
via regular lookup.
>From my pov, what would help is if one had a flexible way to scope opens
on e.g., filesystem. But imo, that is not policy the kernel can
reasonably express at the syscall api layer - it would look fugly as
hell and how many other knobs would we have to add to satisfy all needs.
I think that is best left to an lsm hooking into security_file_open()
which can maintain a map of files and filesystems to allow or deny - a
bpf lsm can do this quite nicely.