Re: RFC: Remove thirdparty

Henry Robinson Thu, 26 May 2016 10:14:42 -0700

On 26 May 2016 at 10:06, Todd Lipcon <[email protected]> wrote:

> In terms of Apache policies, it's OK to require some "Impala" toolchain,
> so long as the ability to regenerate that toolchain is public.
>
> For example, in Kudu, we use thirdparty tarballs hosted on S3. The actual
> bucket is owned by Cloudera (someone has to pay for it), but the tarballs
> are exactly the upstream source releases of the dependencies, so if someone
> wanted to use their own copies, it could be done with a bit of work.
>


What about LLVM / GCC? Are those hosted in S3 as well for Kudu?


>
> I think Impala depending upon pre-built thirdparty deps is also fine, so
> long as they can be re-built from source using publicly available scripts.
> Making it trivial to do so isn't a strict requirement IMO -- so long as if
> someone asked for help to do that work, they got the appropriate assistance.
>
> In terms of depending upon vendor packages (CDH) vs upstream releases,
> again I think it's reasonable to continue to use the current dependencies
> for the time being until some contributor steps forward and volunteers to
> make some change. Projects like Apache Ambari already do this (they deploy
> HDP) so there's precedent.
>
> -Todd
>
> On Thu, May 26, 2016 at 9:40 AM, Michael Ho <[email protected]> wrote:
>
>> Also adding mentors.
>>
>> On Thu, May 26, 2016 at 9:37 AM, Michael Ho <[email protected]> wrote:
>>
>>> I guess point number 1 is more about requiring all the thirdparty binary
>>> for getting Impala to build
>>> and work to be located at a location specified by the environment
>>> variable $IMPALA_TOOLCHAIN.
>>>
>>> It's not strictly necessary for users to use exactly the version of
>>> toolchain we provide. For instance,
>>> a user can check out a copy of our native-toolchain (which is public)
>>> and tinkle with it or they can
>>> create their own version of IMPALA_TOOLCHAIN as long as they have all
>>> the necessary binaries
>>> we expect.
>>>
>>> The user can also feel free to create a symlink to the system library of
>>> their choice in the
>>> $IMPALA_TOOLCHAIN directory if they choose to do so.
>>>
>>> My question is more about whether we should clean up our build script so
>>> that we expect to find
>>> everything we need to build in $IMPALA_TOOLCHAIN ?
>>>
>>> Michael
>>>
>>> On Thu, May 26, 2016 at 8:53 AM, Tim Armstrong <[email protected]>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, May 25, 2016 at 8:42 PM, Michael Ho <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Following up on the discussion about IMPALA-3223, I'd like to send out
>>>>> an email about the removal of thirdparty. In particular, the following
>>>>> changes
>>>>> will happen in stages. Please voice your comment before I commit to
>>>>> any action.
>>>>>
>>>>> 1. Requires $IMPALA_TOOLCHAIN to be set in order to build Impala.
>>>>> In other words, all the logic in the build script to build thirdparty
>>>>> component
>>>>> if $IMPALA_TOOLCHAIN is not set will be removed.
>>>>>
>>>>
>>>> I think we probably need to make a firm decision about whether we're
>>>> going to try to support non-toolchain builds. In the past we've said that
>>>> it would be nice to allow building Impala with system libraries (even if we
>>>> don't put special effort into supporting it), but I don't think we've
>>>> committed to the idea, or committed to toolchain builds only.
>>>>
>>>> If we're going to support non-toolchain builds we would need some kind
>>>> of testing to prevent it breaking all the time.
>>>>
>>>> It would be nice to have, but I'm not sure anyone has the
>>>> time/motivation to do it. What do people think?
>>>>
>>>>
>>>>>
>>>>> 2. Remove build_thirdparty.sh
>>>>>
>>>>> 3. Move postgressql-jdbc and may be llama-minikdc (?) to toolchain and
>>>>> update
>>>>> scripts about it.
>>>>>
>>>>
>>>>> 4. Remove everything in thirdparty directory except for the following
>>>>> components:
>>>>> hadoop, hbase, hive, llama and sentry.
>>>>>
>>>>> 5. Update integration jenkins job to copy the snapshots of the
>>>>> components above to
>>>>> internal jenkins repo in addition to checking them in to github.
>>>>> Update bootstrap_toolchain
>>>>> to point to internal repos.
>>>>>
>>>>> 6. Remove thirdparty directory and update integration job to not check
>>>>> in to git repo.
>>>>>
>>>>> After step (3) is done, we can already push the changes of the build
>>>>> script to ASF tree
>>>>> and check in snapshots of hadoop, hbase, llama and sentry to S3 and
>>>>> hopefully
>>>>> get the build to work.
>>>>>
>>>>
>>>> We can probably test this out as we go by manually copying the
>>>> artifacts to the impala-incubator repo. I did a test of this yesterday
>>>> (running download_requirements and copying thirdparty) and it built ok.
>>>>
>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks,
>>>>> Michael
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks,
>>> Michael
>>>
>>
>>
>>
>> --
>> Thanks,
>> Michael
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679

Re: RFC: Remove thirdparty

Reply via email to