My previous response was missing some context.  There's
bin/bootstrap_toolchain.py in the Impala repo that downloads prebuilt
dependencies of the right versions from S3. I modifying this script or
creating a similar script to download pre-built test dependencies is a good
idea.

There is a different aspect to the native toolchain, the build scripts in
native-toolchain that bootstrap Impala's native dependencies starting from
gcc. The output artifacts of this process are uploaded to S3. Other
dependencies (hadoop, etc) are built in a different way so I think the
native-toolchain repo doesn't need to know about them. libhdfs is maybe a
corner case where it would be good to add it to the toolchain if possible
to make the build more reproducible.

On Thu, Mar 10, 2016 at 11:24 AM, Daniel Hecht <[email protected]> wrote:

> On Thu, Mar 10, 2016 at 11:10 AM, Henry Robinson <[email protected]>
> wrote:
> > I didn't think that binaries were uploaded to any repository, but instead
> > to S3 (and therefore there's no version history) or some other URL.
> That's
> > what I'd suggest we continue to do.
> >
>
> A bit of a tangent (but important if we will rely even more on
> toolchain: the fact that the binaries (and clean source) are only
> copied to S3 seems like a problem.  What happens if someone
> accidentally 'rm -rf' the toolchain bucket?  Can we reproduce our old
> build exactly?  Are we at least backing up the S3 toolchain bucket
> somehow?
>

Reply via email to