Yes, I too am particularly concerned about maintaining the ability to build offline, and downloading the same things over and over again.
I don't quite understand the case against versioning - if gc'ing obsolete versions in order to reduce storage space is a concern, then it's probably fine to a) blow away and re-download everything, or b) throw away old versions manually, if you happen to be in a situation where a) isn't possible. On Tue, Feb 28, 2017 at 12:20 PM, Tim Armstrong <[email protected]> wrote: > I agree it's not too bad if you have a fat pipe to S3, but it's a pretty > bad regression in usability to make it the default and particularly provide > no way to opt out. > > The toolchain is almost 1GB though, which is pretty problematic to download > if a developer is on coffee-shop wifi, cellular wireless, airplane wifi, > etc. It'd also be pretty easy for a developer working offline to switch > branches, run buildall.sh, have gcc, etc, automatically deleted and then be > stuck unable to build anything. > > > On Tue, Feb 28, 2017 at 9:07 AM, Henry Robinson <[email protected]> wrote: > >> I'd prefer not to do that because it's something of a hack and generates >> too many artifacts if we make incremental build changes, not to mention the >> extra complexity required to make such a change because new tarballs might >> need to be uploaded. >> >> >> >> >> On Tue, Feb 28, 2017 at 8:55 AM Lars Volker <[email protected]> wrote: >> >> > Can we add another version string component like -1 or -impala1, or add a >> > dummy patch to the affected packages to allow for new versions with the >> > same upstream version? I think this is what Linux distributions commonly >> do >> > to have several versions of the same upstream version. >> > >> > On Feb 27, 2017 21:15, "Henry Robinson" <[email protected]> wrote: >> > >> > Yes, it would force re-downloading. At my office, downloading a toolchain >> > takes a matter of a few seconds, so I'm not sure the cost is that great. >> > And if it turned out to be problematic, one could always change the >> > toolchain directory for different branches. Having something locally that >> > set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/ >> would >> > work. >> > >> > However I wouldn't want to force behaviour that into the toolchain >> scripts >> > because of the need for garbage collection it would raise - it wouldn't >> be >> > clear when to delete old toolchains programatically. >> > >> > On 27 February 2017 at 20:51, Tim Armstrong <[email protected]> >> > wrote: >> > >> > > Maybe I'm misunderstanding, but wouldn't that force re-downloading of >> the >> > > entire toolchain every time a developer switches between branches with >> > > different build IDs? >> > > >> > > I know some developers do that frequently, e.g. to try and reproduce >> bugs >> > > on older versions or backport patches. >> > > >> > > I agree it would be good to fix this, since I've run into this problem >> > > before, I'm just not quite sure what the best solution is. In the other >> > > case where I had this issue with LLVM I changed the version number (by >> > > appending noasserts-) to it, but that's really just a hack. >> > > >> > > -Tim >> > > >> > > On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson <[email protected]> >> > > wrote: >> > > >> > > > As Matt said, I have a patch that implements build ID-based >> versioning >> > at >> > > > https://gerrit.cloudera.org/#/c/6166/2. >> > > > >> > > > Does anyone want to take a look? If we could get this in soon it >> would >> > > help >> > > > smooth over the LZ4 change which is going in shortly. >> > > > >> > > > On 27 February 2017 at 14:21, Henry Robinson <[email protected]> >> > wrote: >> > > > >> > > > > I agree that that might be useful, and that it's a separately >> > > addressable >> > > > > problem. >> > > > > >> > > > > On 27 February 2017 at 14:18, Matthew Jacobs <[email protected]> >> > wrote: >> > > > > >> > > > >> Just catching up to this e-mail, though I had seen your code >> reviews >> > > > >> and I think this approach makes sense. An additional concern would >> > be >> > > > >> how to identify how a toolchain package was built, and AFAIK this >> is >> > > > >> tricky now if only the 'toolchain ID' is known. Before I saw this >> > > > >> e-mail I was thinking about this problem (which I think we can >> > address >> > > > >> separately), and that we might want to write the native-toolchain >> > git >> > > > >> hash with every toolchain build so that the exact build scripts >> are >> > > > >> associated with those build artifacts. I filed >> > > > >> https://issues.cloudera.org/browse/IMPALA-5002 for this related >> > > > >> problem. >> > > > >> >> > > > >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson < >> [email protected]> >> > > > >> wrote: >> > > > >> > As written, the toolchain can't apparently deal with the >> > possibility >> > > > of >> > > > >> > build flags changing, but a dependency version remaining the >> same. >> > > > >> > >> > > > >> > LZ4 has never (afaict) been built with optimization enabled. I >> > have >> > > a >> > > > >> > commit that enables -O3, but that continues to produce artifacts >> > for >> > > > >> > lz4-1.7.5 with no version change. This is a problem because >> > > > >> bootstrapping >> > > > >> > the toolchain will fail to pick up the new binaries - because >> the >> > > > >> > previously downloaded version is still in the local cache, and >> > won't >> > > > be >> > > > >> > overwritten because of the version change. >> > > > >> > >> > > > >> > I think the simplest way to fix this is to write the toolchain >> > build >> > > > ID >> > > > >> to >> > > > >> > the dependency version file (that's in the local cache only) >> when >> > > it's >> > > > >> > downloaded. If that ID changes, the dependency will be >> > > re-downloaded. >> > > > >> > >> > > > >> > This has the disadvantage that any bump in >> > IMPALA_TOOLCHAIN_BUILD_ID >> > > > >> will >> > > > >> > invalidate all dependencies, and bin/bootstrap_toolchain.py will >> > > > >> > re-download all of them. My feeling is that that cost is better >> > than >> > > > >> trying >> > > > >> > to individually determine whether a dependency has changed >> between >> > > > >> > toolchain builds. >> > > > >> > >> > > > >> > Any thoughts on whether this is the right way to go? >> > > > >> > >> > > > >> > Henry >> > > > >> >> > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Henry Robinson >> > > > > Software Engineer >> > > > > Cloudera >> > > > > 415-994-6679 <(415)%20994-6679> >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Henry Robinson >> > > > Software Engineer >> > > > Cloudera >> > > > 415-994-6679 <(415)%20994-6679> >> > > > >> > > >> > >> > >> > >> > -- >> > Henry Robinson >> > Software Engineer >> > Cloudera >> > 415-994-6679 <(415)%20994-6679> <(415)%20994-6679> >> > >>
