Hi everyone,
Sorry about the confusion here, hopefully a little more info about the
Kite project will help. Kite is intended to work across any Hadoop
distribution and we've structured the libraries to depend on the
upstream Apache versions by default. But, we also want CI to tell us if
anything breaks downstream, so we allow vendor-specific parts. That's
why we have dependency aggregators named "default" (upstream Apache),
"cdh5", etc.
You can also see this at work in the kite-tools module, where we have a
generic runtime that tries to construct the right classpath to run on
any distribution (we've seen people running it on MapR and HDP). The
likely culprit in this situation, though, is our kite-tools-cdh5 module
that bundles CDH5 dependencies so people can use it their local machine
that doesn't have Hadoop installed.
I sympathize with the view that we shouldn't depend on proprietary
infra, but I think we have good reasons for catching bugs early (testing
against vendors) and making the CLI run on non-Hadoop machines.
To avoid this issue, I suggest excluding the vendor-specific modules
from the build. That should be easy to do by using the -pl and -amr
maven command options. The -pl option allows you to supply a list of the
modules to build and -amr ensures the dependency modules are present. By
running with "-pl kite-tools-runtime -amr" you should be able to avoid
hitting vendor repos.
If we need upstream changes, we can make that happen too. I hope that helps!
rb
(By the way, I'm not subscribed to dev@bigtop so cc me to keep me in the
discussion.)
On 09/10/2015 01:44 PM, Mark Grover wrote:
Hi all,
I am not quite sure I completely understand the issue being discussed
here. Is the issue that there are some CDH5 dependencies being bundled
in Kite build? If so, I am adding Ryan Blue one of the contributors to
Kite to share some thoughts on it.
If not, please let me know how I can help.
Here are my thoughts on a few other things being discussed:
Are you saying these dependencies are
'compile-time' only?
Actually, they are not. Many projects, Apache Flume, for example, use
Kite and those jars likely end up in the packages.
>even hue is downloading cloudera snapshots (sic!) from maven to
compile against. IIRC these binaries are not bundled with "our" packaging.
I think you are correct here. However, we also ship LinkedIn's DataFu,
(Yahoo's) YCSB, Amplab's Tachyon. I don't think it's productive use of
anyone's time to go searching for where all these dependencies come
from. I personally like to think from a Bigtop community perspective.
What tools does an average Bigtop user want to use from the Hadoop
ecosystem? And, if someone in the Bigtop community is willing to
contribute that tool to the project, that's great! (and the license
being compatible/ ASL v2).
Mark
On Thu, Sep 10, 2015 at 1:02 PM, Konstantin Boudnik <[email protected]
<mailto:[email protected]>> wrote:
On Thu, Sep 10, 2015 at 01:20PM, RJ Nowling wrote:
> I think that was the second part of my statement. :) I don't see that
> happening since neither Hue nor Kite are Apache projects and have no
> incentive to distance themselves from particular vendors.
It isn't so much of distancing away from anything. It's more like a
common
sense of not using a proprietary infra (it's an implementation
detail, as you
should be able to plug this in via settings.xml file in your private
environment) for commonly available artifacts. If I can put it
bluntly - learn
your tools before opening your stuff to others ;)
Cos
> On Thu, Sep 10, 2015 at 1:18 PM, Konstantin Boudnik
<[email protected] <mailto:[email protected]>> wrote:
>
> > Or trying to convince the upstream projects to stop using their
mirrors for
> > what they call "open source" projects?
> >
> > Cos
> >
> > On Thu, Sep 10, 2015 at 12:05PM, RJ Nowling wrote:
> > > I don't know how we'd get around it without patching the
upstreams'
> > > dependencies or convincing the upstream projects to use the
Apache repos
> > >
> > > On Thu, Sep 10, 2015 at 11:46 AM, Konstantin Boudnik
<[email protected] <mailto:[email protected]>>
> > wrote:
> > >
> > > > On Thu, Sep 10, 2015 at 11:06AM, Olaf Flebbe wrote:
> > > >
> > > > > even hue is downloading cloudera snapshots (sic!) from
maven to
> > compile
> > > > > against. IIRC these binaries are not bundled with "our"
packaging.
> > > >
> > > > Yeah, Hue is another of those. Are you saying these
dependencies are
> > > > 'compile-time' only? If we aren't bundling anything of
these 3rd party
> > > > binaries to our convenience packages - I am ok. Still feels
icky though
> > > >
> > > > Thanks.
> > > > Cos
> > > >
> > > > > > Am 10.09.2015 um 03:18 schrieb Konstantin Boudnik
<[email protected] <mailto:[email protected]>
> > >:
> > > > > >
> > > > > > I have been running the full build to validate the DSL
patch and
> > have
> > > > noticed
> > > > > > that kite downloads an enormous amount of Apache stuff from
> > Cloudera's
> > > > repo.
> > > > > > While is bad by itself, as we have no clue what's in
there, I don't
> > > > understand
> > > > > > why we have to bring things like httpcomponents, ant,
etc from a
> > 3rd
> > > > party
> > > > > > repo-server. That's seems quite bad to me.
> > > > > >
> > > > > > Also, looking at it I see that the build creates things
like
> > > > > > [INFO] Building Kite Hadoop CDH5 Dependencies Module
1.1.0
> > > > > >
> > > > > > which creates a bad impression that Apache Bigtop is
providing a
> > > > commercial's
> > > > > > vendors binaries. Can anyone who has the knowledge
about this
> > component
> > > > > > address these issues somehow?
> > > > > >
> > > > > > Thank you very much!
> > > > > > Cos
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> >
--
Ryan Blue