Re: Conditionally open the network in the build sandbox

Benjamin Schubert Wed, 28 Oct 2020 11:19:39 -0700

Hey,

Sorry for the late response. Responses inline

The TLDR would be that we cannot have user interaction and, in some cases it
is not possible to know the dependencies before executing the code.
We need to have this intermediate state of the system to prove it could work
before moving to it and actually being able to restrict access to the sandbox.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Saturday, October 17th, 2020 at 9:14 AM, Tristan Van Berkom 
<[email protected]> wrote:

[snip]

> >
> > The point is about being able to incrementally improve the way your builds 
> > are
> > made. It can be extremely hard, or too costly for a company to move 
> > everything
> > to a 'correct' point before being able to move to BuildStream.
> >
> > This opens more an easy adoption of BuildStream, by allowing most of the
> > benefits of it, and allowing to improve the 'bad' elements later, rather 
> > than
> > having a way of solving every problem before moving to it.
> >
> > For me, this is a key point that BuildStream offered, compared to, for 
> > example,
> > Bazel, which is more or less 'all or nothing'.
> >
> > Yes, having network access during your builds can be bad, and we should
> > absolutely discourage that. However, I strongly believe that in order to 
> > help
> > with adoption, users should be allowed to explicitely ask the permission to
> > shoot themselves in the foot.
>
> I'm not trying to block this outright, but still want to caution that
> this really might do more harm than it does good.
>
> Also, I would be more interested to see examples which are specifically
> difficult to deal with, and work on making those use cases easier to
> deal with.

The main problem I see is around languages' build systems that want to act as
package managers very strongly. There is not always a good and easy way of
downloading the correct dependencies (npm/python are two examples there),
which would usually work if we got human actions converting the projects.

When wanting to move a lot of projects in one go, it would be far easier for us
to be able to move to BuildStream first, and then handle the hermeticity as a
separate work stream. Essentially detangling the move to the new 'system'
from some of the underlying infrastructure work that is required.

> This is after all why we developed plugins like the cargo plugin for
> rust apps (some weird languages want to download dependencies from the
> internet) - if we can write source plugins for these problematic
> languages, this is a better avenue to pursue than to just give up and
> open up the network at build time.

This is correct, however, it is not always possible to make it entirely
automatic without user interactions.
For large organizations, it is much better to be able to demonstrate that a
system could work without user interactions first, and then moving the system
up from a set point, therefore stopping regressions and progressively making
the whole system better.

>
> Do we have concrete examples of workflows which are difficult to port
> which cannot be solved with source plugins we can write ?

The current `pip` source plugin (and friends) assume that the set of
dependencies to install will be the same same regardless of which platform pip
is running on and the python version that is used. This is not true as there
are platform and version specific dependencies, etc.

In more general terms, language-specific building tools that act as dependency
managers and download the dependency based on the host platform they are
running from. Again, this would not be a problem with user interaction and
forcing one way of doing. However that is not an option for us.

>
> [...]
> > > If network access were "supported", this could lead us down a road
> > > where we are requested to rebuild things when artifacts are too old
> > > (for instance, I cannot build against this dependency, because the
> > > random source code it downloaded from the internet only 3 months ago is
> > > no longer valid for me to build against).
> >
> > Again, if you download from the internet that is a problem. If you have 
> > already
> > a cache for such dependencies in your organization, that is not a concern.
>
> I think the above is dangerous thinking, and there are certainly still
> concerns.
>
> Just because you are maintaining a store of sources in your own
> organization does not really mean that those sources are not random,
> and while it might work for a time (or even forever, if you are very
> careful to always use it correctly), it will be difficult to maintain
> and guarantee correctness of your builds over longer periods of time.
>
> Essentially, this externalizes the problem of matching up a specific
> set of inputs with a given build, making it the organizations problem
> instead of using BuildStream to specifically address all inputs (with
> checksums or git shas or the like)... I can see this being a cause of
> frustration years down the line.

[snip]

>
> This margin of error can be eliminated by ensuring that your
> BuildStream project data addresses sources specifically and does not
> tolerate variants (like tarball address without checksum for instance).

I am not saying this is perfect. Technically I would prefer having everything
handled by BuildStream, but big systems need a way forward that does not
require:

- A flag day change
- A lot of pre-work before showing it can work

>
> > > Mostly I worry that if we give users any opportunity to allow internet
> > > access in builds, we almost guarantee that they will not do the minimal
> > > legwork required to ensure their build works without network access.
> >
> > I believe that this is to the users to decide. We can make it very clear
> > in the docs that this is a bad idea, and users can decide to shoot 
> > themselves
> > in the foot.
> >
> [...]
> > No, again, this is a documentation concern. We can document that by using 
> > this
> > you understand your element is not repeatable etc.
> >
> > Fixing 'one element' is something that is doable indeed. Doing that at a
> > company scale, on large projects become extremely hard. Being able to move 
> > to
> > BuildStream before, allows to stop the bleeding for new elements and cleanup
> > the rest on a longer term.
>
> I am worried about underestimating the willingness of users to not fix
> a problem when given any opportunity, I expect that given a simple
> switch, people will always turn it on at any time any element gives
> them trouble even if it was relatively easy to fix, and they will not
> circle back to fix this problem element until it has hurt them, twice.
>
> Regardless of whether we warned users in advance, they would be right
> to blame us for not forcing them to do a small amount of work in
> advance and save them from possible problems down the line.

I understand your concerns, and I would much prefer being able to move forward
without requiring such options. However, I still think that users are 
responsible
in the end on how to setup their system. In the end, they could also simply
use 'buildbox-run-hostools' to use BuildStream and not gain 90% of the safety
that BuildStream provides (which I am not advocating for).

>
> In conclusion, these are my general feelings on the matter:
>
>   * If there are cases I've overlooked where a "source transform"
>     would not remedy the situation, then maybe we need a switch
>     like you suggest, but I'd really like to be enlightened as to
>     what those cases are.

Those systems usually require a user interaction to enable the source transform
to work correctly. That's sadly not always possible.

>
>   * Right now we only have "source transforms" for python and rust
>     (the pip and the cargo sources).
>
>     This is a sad state of affairs, there are probably some other
>     weird languages doing similar things, node.js comes to mind.
>
>     Is it realistic for us to be able to cover every language which
>     downloads external dependencies from the internet at build time
>     with a source plugin (like pip/cargo) that we maintain in our
>     trusted plugin collection ?
>
>     I feel like this is realistic, but if not, then maybe it also
>     calls for such a switch.

I agree this is realistic and would like to work toward that. I however don't
think this removes the need for the switch in some edge cases.

>
>   * There are some cases which are ridiculously easy to fix, like
>     scripts which want to download config.sub and config.guess from
>     git.savannah.gnu.org (which sometimes need to be fixed to copy them
>     in from /usr/share/automake/ instead), or builds like cracklib
>     which might want to download it's word dictionary if the user
>     forgot to separately put it in place.
>
>     In these cases, I worry that providing a switch is going to
>     result in most of these easy to fix issues being completely
>     ignored and worked around, leading to a generally bad user
>     experience down the road.
>
>
> Cheers,
>     -Tristan

Hope that helps,
Cheers,
Ben

Re: Conditionally open the network in the build sandbox

Reply via email to