On 26 May 2016 at 10:06, Todd Lipcon <[email protected]> wrote: > In terms of Apache policies, it's OK to require some "Impala" toolchain, > so long as the ability to regenerate that toolchain is public. > > For example, in Kudu, we use thirdparty tarballs hosted on S3. The actual > bucket is owned by Cloudera (someone has to pay for it), but the tarballs > are exactly the upstream source releases of the dependencies, so if someone > wanted to use their own copies, it could be done with a bit of work. >
What about LLVM / GCC? Are those hosted in S3 as well for Kudu? > > I think Impala depending upon pre-built thirdparty deps is also fine, so > long as they can be re-built from source using publicly available scripts. > Making it trivial to do so isn't a strict requirement IMO -- so long as if > someone asked for help to do that work, they got the appropriate assistance. > > In terms of depending upon vendor packages (CDH) vs upstream releases, > again I think it's reasonable to continue to use the current dependencies > for the time being until some contributor steps forward and volunteers to > make some change. Projects like Apache Ambari already do this (they deploy > HDP) so there's precedent. > > -Todd > > On Thu, May 26, 2016 at 9:40 AM, Michael Ho <[email protected]> wrote: > >> Also adding mentors. >> >> On Thu, May 26, 2016 at 9:37 AM, Michael Ho <[email protected]> wrote: >> >>> I guess point number 1 is more about requiring all the thirdparty binary >>> for getting Impala to build >>> and work to be located at a location specified by the environment >>> variable $IMPALA_TOOLCHAIN. >>> >>> It's not strictly necessary for users to use exactly the version of >>> toolchain we provide. For instance, >>> a user can check out a copy of our native-toolchain (which is public) >>> and tinkle with it or they can >>> create their own version of IMPALA_TOOLCHAIN as long as they have all >>> the necessary binaries >>> we expect. >>> >>> The user can also feel free to create a symlink to the system library of >>> their choice in the >>> $IMPALA_TOOLCHAIN directory if they choose to do so. >>> >>> My question is more about whether we should clean up our build script so >>> that we expect to find >>> everything we need to build in $IMPALA_TOOLCHAIN ? >>> >>> Michael >>> >>> On Thu, May 26, 2016 at 8:53 AM, Tim Armstrong <[email protected]> >>> wrote: >>> >>>> >>>> >>>> On Wed, May 25, 2016 at 8:42 PM, Michael Ho <[email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> Following up on the discussion about IMPALA-3223, I'd like to send out >>>>> an email about the removal of thirdparty. In particular, the following >>>>> changes >>>>> will happen in stages. Please voice your comment before I commit to >>>>> any action. >>>>> >>>>> 1. Requires $IMPALA_TOOLCHAIN to be set in order to build Impala. >>>>> In other words, all the logic in the build script to build thirdparty >>>>> component >>>>> if $IMPALA_TOOLCHAIN is not set will be removed. >>>>> >>>> >>>> I think we probably need to make a firm decision about whether we're >>>> going to try to support non-toolchain builds. In the past we've said that >>>> it would be nice to allow building Impala with system libraries (even if we >>>> don't put special effort into supporting it), but I don't think we've >>>> committed to the idea, or committed to toolchain builds only. >>>> >>>> If we're going to support non-toolchain builds we would need some kind >>>> of testing to prevent it breaking all the time. >>>> >>>> It would be nice to have, but I'm not sure anyone has the >>>> time/motivation to do it. What do people think? >>>> >>>> >>>>> >>>>> 2. Remove build_thirdparty.sh >>>>> >>>>> 3. Move postgressql-jdbc and may be llama-minikdc (?) to toolchain and >>>>> update >>>>> scripts about it. >>>>> >>>> >>>>> 4. Remove everything in thirdparty directory except for the following >>>>> components: >>>>> hadoop, hbase, hive, llama and sentry. >>>>> >>>>> 5. Update integration jenkins job to copy the snapshots of the >>>>> components above to >>>>> internal jenkins repo in addition to checking them in to github. >>>>> Update bootstrap_toolchain >>>>> to point to internal repos. >>>>> >>>>> 6. Remove thirdparty directory and update integration job to not check >>>>> in to git repo. >>>>> >>>>> After step (3) is done, we can already push the changes of the build >>>>> script to ASF tree >>>>> and check in snapshots of hadoop, hbase, llama and sentry to S3 and >>>>> hopefully >>>>> get the build to work. >>>>> >>>> >>>> We can probably test this out as we go by manually copying the >>>> artifacts to the impala-incubator repo. I did a test of this yesterday >>>> (running download_requirements and copying thirdparty) and it built ok. >>>> >>>> >>>>> >>>>> >>>>> -- >>>>> Thanks, >>>>> Michael >>>>> >>>> >>>> >>> >>> >>> -- >>> Thanks, >>> Michael >>> >> >> >> >> -- >> Thanks, >> Michael >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera > -- Henry Robinson Software Engineer Cloudera 415-994-6679
