On Thu, May 26, 2016 at 10:13 AM, Henry Robinson <[email protected]> wrote:

>
>
> On 26 May 2016 at 10:06, Todd Lipcon <[email protected]> wrote:
>
>> In terms of Apache policies, it's OK to require some "Impala" toolchain,
>> so long as the ability to regenerate that toolchain is public.
>>
>> For example, in Kudu, we use thirdparty tarballs hosted on S3. The actual
>> bucket is owned by Cloudera (someone has to pay for it), but the tarballs
>> are exactly the upstream source releases of the dependencies, so if someone
>> wanted to use their own copies, it could be done with a bit of work.
>>
>
> What about LLVM / GCC? Are those hosted in S3 as well for Kudu?
>

Yes, though we don't currently rebuild GCC. We do rebuild libstdcxx for the
purposes of TSAN builds.

It does make our initial build pretty long, so caching built artifacts for
different platforms would be nice, but we don't do that today.

-Todd

>
>
>>
>> I think Impala depending upon pre-built thirdparty deps is also fine, so
>> long as they can be re-built from source using publicly available scripts.
>> Making it trivial to do so isn't a strict requirement IMO -- so long as if
>> someone asked for help to do that work, they got the appropriate assistance.
>>
>> In terms of depending upon vendor packages (CDH) vs upstream releases,
>> again I think it's reasonable to continue to use the current dependencies
>> for the time being until some contributor steps forward and volunteers to
>> make some change. Projects like Apache Ambari already do this (they deploy
>> HDP) so there's precedent.
>>
>> -Todd
>>
>> On Thu, May 26, 2016 at 9:40 AM, Michael Ho <[email protected]> wrote:
>>
>>> Also adding mentors.
>>>
>>> On Thu, May 26, 2016 at 9:37 AM, Michael Ho <[email protected]> wrote:
>>>
>>>> I guess point number 1 is more about requiring all the thirdparty
>>>> binary for getting Impala to build
>>>> and work to be located at a location specified by the environment
>>>> variable $IMPALA_TOOLCHAIN.
>>>>
>>>> It's not strictly necessary for users to use exactly the version of
>>>> toolchain we provide. For instance,
>>>> a user can check out a copy of our native-toolchain (which is public)
>>>> and tinkle with it or they can
>>>> create their own version of IMPALA_TOOLCHAIN as long as they have all
>>>> the necessary binaries
>>>> we expect.
>>>>
>>>> The user can also feel free to create a symlink to the system library
>>>> of their choice in the
>>>> $IMPALA_TOOLCHAIN directory if they choose to do so.
>>>>
>>>> My question is more about whether we should clean up our build script
>>>> so that we expect to find
>>>> everything we need to build in $IMPALA_TOOLCHAIN ?
>>>>
>>>> Michael
>>>>
>>>> On Thu, May 26, 2016 at 8:53 AM, Tim Armstrong <[email protected]
>>>> > wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, May 25, 2016 at 8:42 PM, Michael Ho <[email protected]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Following up on the discussion about IMPALA-3223, I'd like to send out
>>>>>> an email about the removal of thirdparty. In particular, the
>>>>>> following changes
>>>>>> will happen in stages. Please voice your comment before I commit to
>>>>>> any action.
>>>>>>
>>>>>> 1. Requires $IMPALA_TOOLCHAIN to be set in order to build Impala.
>>>>>> In other words, all the logic in the build script to build thirdparty
>>>>>> component
>>>>>> if $IMPALA_TOOLCHAIN is not set will be removed.
>>>>>>
>>>>>
>>>>> I think we probably need to make a firm decision about whether we're
>>>>> going to try to support non-toolchain builds. In the past we've said that
>>>>> it would be nice to allow building Impala with system libraries (even if 
>>>>> we
>>>>> don't put special effort into supporting it), but I don't think we've
>>>>> committed to the idea, or committed to toolchain builds only.
>>>>>
>>>>> If we're going to support non-toolchain builds we would need some kind
>>>>> of testing to prevent it breaking all the time.
>>>>>
>>>>> It would be nice to have, but I'm not sure anyone has the
>>>>> time/motivation to do it. What do people think?
>>>>>
>>>>>
>>>>>>
>>>>>> 2. Remove build_thirdparty.sh
>>>>>>
>>>>>> 3. Move postgressql-jdbc and may be llama-minikdc (?) to toolchain
>>>>>> and update
>>>>>> scripts about it.
>>>>>>
>>>>>
>>>>>> 4. Remove everything in thirdparty directory except for the following
>>>>>> components:
>>>>>> hadoop, hbase, hive, llama and sentry.
>>>>>>
>>>>>> 5. Update integration jenkins job to copy the snapshots of the
>>>>>> components above to
>>>>>> internal jenkins repo in addition to checking them in to github.
>>>>>> Update bootstrap_toolchain
>>>>>> to point to internal repos.
>>>>>>
>>>>>> 6. Remove thirdparty directory and update integration job to not
>>>>>> check in to git repo.
>>>>>>
>>>>>> After step (3) is done, we can already push the changes of the build
>>>>>> script to ASF tree
>>>>>> and check in snapshots of hadoop, hbase, llama and sentry to S3 and
>>>>>> hopefully
>>>>>> get the build to work.
>>>>>>
>>>>>
>>>>> We can probably test this out as we go by manually copying the
>>>>> artifacts to the impala-incubator repo. I did a test of this yesterday
>>>>> (running download_requirements and copying thirdparty) and it built ok.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks,
>>>>>> Michael
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Michael
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> Michael
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to