My previous response was missing some context. There's bin/bootstrap_toolchain.py in the Impala repo that downloads prebuilt dependencies of the right versions from S3. I modifying this script or creating a similar script to download pre-built test dependencies is a good idea.
There is a different aspect to the native toolchain, the build scripts in native-toolchain that bootstrap Impala's native dependencies starting from gcc. The output artifacts of this process are uploaded to S3. Other dependencies (hadoop, etc) are built in a different way so I think the native-toolchain repo doesn't need to know about them. libhdfs is maybe a corner case where it would be good to add it to the toolchain if possible to make the build more reproducible. On Thu, Mar 10, 2016 at 11:24 AM, Daniel Hecht <[email protected]> wrote: > On Thu, Mar 10, 2016 at 11:10 AM, Henry Robinson <[email protected]> > wrote: > > I didn't think that binaries were uploaded to any repository, but instead > > to S3 (and therefore there's no version history) or some other URL. > That's > > what I'd suggest we continue to do. > > > > A bit of a tangent (but important if we will rely even more on > toolchain: the fact that the binaries (and clean source) are only > copied to S3 seems like a problem. What happens if someone > accidentally 'rm -rf' the toolchain bucket? Can we reproduce our old > build exactly? Are we at least backing up the S3 toolchain bucket > somehow? >
