Hi everyone,

Sorry about the confusion here, hopefully a little more info about the Kite project will help. Kite is intended to work across any Hadoop distribution and we've structured the libraries to depend on the upstream Apache versions by default. But, we also want CI to tell us if anything breaks downstream, so we allow vendor-specific parts. That's why we have dependency aggregators named "default" (upstream Apache), "cdh5", etc.

You can also see this at work in the kite-tools module, where we have a generic runtime that tries to construct the right classpath to run on any distribution (we've seen people running it on MapR and HDP). The likely culprit in this situation, though, is our kite-tools-cdh5 module that bundles CDH5 dependencies so people can use it their local machine that doesn't have Hadoop installed.

I sympathize with the view that we shouldn't depend on proprietary infra, but I think we have good reasons for catching bugs early (testing against vendors) and making the CLI run on non-Hadoop machines.

To avoid this issue, I suggest excluding the vendor-specific modules from the build. That should be easy to do by using the -pl and -amr maven command options. The -pl option allows you to supply a list of the modules to build and -amr ensures the dependency modules are present. By running with "-pl kite-tools-runtime -amr" you should be able to avoid hitting vendor repos.

If we need upstream changes, we can make that happen too. I hope that helps!

rb

(By the way, I'm not subscribed to dev@bigtop so cc me to keep me in the discussion.)

On 09/10/2015 01:44 PM, Mark Grover wrote:
Hi all,
I am not quite sure I completely understand the issue being discussed
here. Is the issue that there are some CDH5 dependencies being bundled
in Kite build? If so, I am adding Ryan Blue one of the contributors to
Kite to share some thoughts on it.

If not, please let me know how I can help.

Here are my thoughts on a few other things being discussed:
Are you saying these dependencies are
'compile-time' only?
Actually, they are not. Many projects, Apache Flume, for example, use
Kite and those jars likely end up in the packages.

 >even hue is downloading cloudera snapshots (sic!) from maven to
compile against. IIRC these binaries are not bundled with "our" packaging.
I think you are correct here. However, we also ship LinkedIn's DataFu,
(Yahoo's) YCSB, Amplab's Tachyon. I don't think it's productive use of
anyone's time to go searching for where all these dependencies come
from. I personally like to think from a Bigtop community perspective.
What tools does an average Bigtop user want to use from the Hadoop
ecosystem? And, if someone in the Bigtop community is willing to
contribute that tool to the project, that's great! (and the license
being compatible/ ASL v2).

Mark


On Thu, Sep 10, 2015 at 1:02 PM, Konstantin Boudnik <[email protected]
<mailto:[email protected]>> wrote:

    On Thu, Sep 10, 2015 at 01:20PM, RJ Nowling wrote:
    > I think that was the second part of my statement.  :)  I don't see that
    > happening since neither Hue nor Kite are Apache projects and have no
    > incentive to distance themselves from particular vendors.

    It isn't so much of distancing away from anything. It's more like a
    common
    sense of not using a proprietary infra (it's an implementation
    detail, as you
    should be able to plug this in via settings.xml file in your private
    environment) for commonly available artifacts. If I can put it
    bluntly - learn
    your tools before opening your stuff to others ;)

    Cos

     > On Thu, Sep 10, 2015 at 1:18 PM, Konstantin Boudnik
    <[email protected] <mailto:[email protected]>> wrote:
     >
     > > Or trying to convince the upstream projects to stop using their
    mirrors for
     > > what they call "open source" projects?
     > >
     > > Cos
     > >
     > > On Thu, Sep 10, 2015 at 12:05PM, RJ Nowling wrote:
     > > > I don't know how we'd get around it without patching the
    upstreams'
     > > > dependencies or convincing the upstream projects to use the
    Apache repos
     > > >
     > > > On Thu, Sep 10, 2015 at 11:46 AM, Konstantin Boudnik
    <[email protected] <mailto:[email protected]>>
     > > wrote:
     > > >
     > > > > On Thu, Sep 10, 2015 at 11:06AM, Olaf Flebbe wrote:
     > > > >
     > > > > > even hue is downloading cloudera snapshots (sic!) from
    maven to
     > > compile
     > > > > > against. IIRC these binaries are not bundled with "our"
    packaging.
     > > > >
     > > > > Yeah, Hue is another of those. Are you saying these
    dependencies are
     > > > > 'compile-time' only?  If we aren't bundling anything of
    these 3rd party
     > > > > binaries to our convenience packages - I am ok. Still feels
    icky though
     > > > >
     > > > > Thanks.
     > > > >   Cos
     > > > >
     > > > > > > Am 10.09.2015 um 03:18 schrieb Konstantin Boudnik
    <[email protected] <mailto:[email protected]>
     > > >:
     > > > > > >
     > > > > > > I have been running the full build to validate the DSL
    patch and
     > > have
     > > > > noticed
     > > > > > > that kite downloads an enormous amount of Apache stuff from
     > > Cloudera's
     > > > > repo.
     > > > > > > While is bad by itself, as we have no clue what's in
    there, I don't
     > > > > understand
     > > > > > > why we have to bring things like httpcomponents, ant,
    etc from a
     > > 3rd
     > > > > party
     > > > > > > repo-server. That's seems quite bad to me.
     > > > > > >
     > > > > > > Also, looking at it I see that the build creates things
    like
     > > > > > >    [INFO] Building Kite Hadoop CDH5 Dependencies Module
    1.1.0
     > > > > > >
     > > > > > > which creates a bad impression that Apache Bigtop is
    providing a
     > > > > commercial's
     > > > > > > vendors binaries. Can anyone who has the knowledge
    about this
     > > component
     > > > > > > address these issues somehow?
     > > > > > >
     > > > > > > Thank you very much!
     > > > > > >  Cos
     > > > > > >
     > > > > >
     > > > >
     > > > >
     > > > >
     > >




--
Ryan Blue

Reply via email to