On Wednesday, 9 May 2018 19:10:13 CEST Mike Gilbert wrote:
> On Wed, May 9, 2018 at 12:34 PM, Matt Turner <[email protected]> wrote:
> > On Tue, May 8, 2018 at 11:51 PM, Dennis Schridde <[email protected]> 
wrote:
> >> Hello!
> >> 
> >> I see sandbox violations similar to "ACCESS DENIED: open_wr: /dev/dri/
> >> renderD128" pop up for more and more packages, probably since OpenCL
> >> becomes used more widely.  Hence I would like to ask: Could we in Gentoo
> >> treat GPUs just like CPUs and allow any process to access render nodes
> >> (i.e. the GPUs compute capabilities via the specific interface the Linux
> >> kernel's DRM offers for that purpose) without sandbox restrictions?
> >> 
> >> --Dennis
> >> 
> >> See-Also: https://bugs.gentoo.org/654216
> > 
> > This seems like a bad idea. With CPUs we've had decades to work out
> > how to isolate processes and prevent them from taking down the system.
> > 
> > GPUs are not there yet. It's simple to trigger an unrecoverable GPU
> > hang and not much harder to turn it into a full system lock up.
> > 
> > This is not safe.
> 
> It's worth noting that the default rules shipped with udev assign mode
> 0666 to the /dev/dri/renderD* device nodes. So, outside of a sanbox
> environment, any user may access these devices.
> 
> This was merged as part of this PR:
> https://github.com/systemd/systemd/pull/7112

Also, what's happening right now is that every ebuild that *does* somehow use 
DRM render nodes receives SANDBOX_PREDICT or SANDBOX_WRITE access to them.

And the cycle is usually:
* Bump into a usage of render nodes that breaks the build at the very end
* Report a bug
* Wait
* The ebuild gets "allow access to the first render node" code added
* Someone with 2 GPUs runs into the same issue for the second render node
* ... rinse and repeat ...
* Eventually, after enough people ran into it, the ebuild gets its own custom
  "find all render nodes and allow access" code added

Additionally it appears that often the usage is indirect, through another tool 
or library.  So for ebuild developers this is not really predictable.

Thus at the very least I would suggest adding code this code (to allow access 
to all render nodes) to an eclass, so it is easier for ebuild developers to 
fix their ebuild properly, once and for all.

But by then the process is so easy and already so many builds are using render 
nodes, that the surface for builds to take down the system is very high.  If 
the chromium build (e.g.) could trigger a bug in Mesa that takes down the 
system, so could anyone else.  And if we trust their toolchain (and with a 
build time of several hours, I believe this to be a large set of tools and a 
lot of code) to not bring down the system, without a complete audit or 
something of the sort, why don't we trust anyone else?

--Dennis

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to