Re: Conditionally open the network in the build sandbox

Tristan Van Berkom Sat, 31 Oct 2020 04:21:05 -0700

Hi,

On Wed, 2020-10-28 at 18:19 +0000, Benjamin Schubert wrote:
> Hey,
> 
> Sorry for the late response. Responses inline
> 
> The TLDR would be that we cannot have user interaction and, in some cases it
> is not possible to know the dependencies before executing the code.
> We need to have this intermediate state of the system to prove it could work
> before moving to it and actually being able to restrict access to the sandbox.
>


Alright, you've made fair points in this thread, I also think we had
discussed a long time ago that we should allow the user to shoot
themselves in the foot if they so desire (Sander phrases this as
"giving them enough rope to hang themselves with").

Let's make the warning signs around this as loud as possible.

It would be nice for instance to have a summary (unconditional warning)
at load time of all elements which bought into this backdoor.

There are some other parts of this I still think are pertinent to
discuss, regardless of the bottom line of the proposal.

Consider the other replies inline to not have baring over the outcome
of the proposal...

[...]
> > This is after all why we developed plugins like the cargo plugin for
> > rust apps (some weird languages want to download dependencies from the
> > internet) - if we can write source plugins for these problematic
> > languages, this is a better avenue to pursue than to just give up and
> > open up the network at build time.
> 
> This is correct, however, it is not always possible to make it entirely
> automatic without user interactions.
> For large organizations, it is much better to be able to demonstrate that a
> system could work without user interactions first, and then moving the system
> up from a set point, therefore stopping regressions and progressively making
> the whole system better.

With the 'pip' plugin we have, we it should essentially only require an
initial run of `bst track` after appending the `pip` source to your
python build, so as to pick up the python dependencies and encode the
specific versions into your project.

I don't think this really counts as human actions any more than
migrating a build to use BuildStream otherwise does.

Maybe you are referring to other human actions I'm overlooking ?

> 
> > Do we have concrete examples of workflows which are difficult to port
> > which cannot be solved with source plugins we can write ?
> 
> The current `pip` source plugin (and friends) assume that the set of
> dependencies to install will be the same same regardless of which platform pip
> is running on and the python version that is used. This is not true as there
> are platform and version specific dependencies, etc.

This statement raises concern (and is the main reason for my elaborate
reply to this email).

If the pip plugin relies too heavily on host tooling to make invalid
inferences about what should be downloaded, this is a serious problem.

The pip plugin is supposed to simply encode up to date versions of
python dependencies which match up with the project's requirements at
track time, and then download those at build time.

If this behavior can vary depending on host pip, then that is a serious
bug in the `pip` plugin implementation - and probably needs to be
implemented differently (possibly without leveraging the `pip` code
itself at all).

> In more general terms, language-specific building tools that act as dependency
> managers and download the dependency based on the host platform they are
> running from. Again, this would not be a problem with user interaction and
> forcing one way of doing. However that is not an option for us.

This is certainly not how source transforms developed for such
languages are intended to work, whether we leverage that host
language's package manager code or not to implement these plugins; it
is our responsibility to ensure that our plugins which do the "source
transform" thing do so reliably consistently regardless of the host
system.

If target system comes into play as a matter of necessity (i.e. the
correct code cannot be inferred without knowing what kind of system the
code should be built/run on) - then this information needs to be
encoded into the Source configuration.

E.g. the `pip` source needs to have configurations stating things like
the python version expected to be in use in the sandbox, if the python
version in use makes a difference, and then the plugin has to "just do
the right thing".

The idea that it is acceptable for the pip plugin to behave in a host
dependent way is rather frightening, and we certainly should not bless
the plugin upstream without it passing these basic requirements first.

> > [...]
> > > > If network access were "supported", this could lead us down a road
> > > > where we are requested to rebuild things when artifacts are too old
> > > > (for instance, I cannot build against this dependency, because the
> > > > random source code it downloaded from the internet only 3 months ago is
> > > > no longer valid for me to build against).
> > > 
> > > Again, if you download from the internet that is a problem. If you have 
> > > already
> > > a cache for such dependencies in your organization, that is not a concern.
> > 
> > I think the above is dangerous thinking, and there are certainly still
> > concerns.
> > 
> > Just because you are maintaining a store of sources in your own
> > organization does not really mean that those sources are not random,
> > and while it might work for a time (or even forever, if you are very
> > careful to always use it correctly), it will be difficult to maintain
> > and guarantee correctness of your builds over longer periods of time.
> > 
> > Essentially, this externalizes the problem of matching up a specific
> > set of inputs with a given build, making it the organizations problem
> > instead of using BuildStream to specifically address all inputs (with
> > checksums or git shas or the like)... I can see this being a cause of
> > frustration years down the line.
> 
> [snip]
> 
> > This margin of error can be eliminated by ensuring that your
> > BuildStream project data addresses sources specifically and does not
> > tolerate variants (like tarball address without checksum for instance).
> 
> I am not saying this is perfect.

To clarify, I was mostly speaking here to the "that is not a concern"
portion of your previous statement about having sources cached within
the organization.

Such a statement can be dangerous inasmuch as one can interpret that
and then disregard the finer implications that one needs to maintain
the association between the versions of your project which must match
the versions of your cached sources.

Cheers,
    -Tristan

Re: Conditionally open the network in the build sandbox

Reply via email to