Thanks for the explanation Ryan! Certainly excluding the non-Apache specific
modules make sense and needs to be done.

The other issue here, is that _all_ dependencies, including those that Hadoop
and other components depends on, are pulled out of Cloudera repo.
That's the biggest one in my opinion. While I am not suspecting Cloudera
will be putting anything malicious into httpcomponents I, as a RM and a PMC
member of this project, don't feel right gpg-signing packages without knowing
what some of the jars contain. So my main concern is that if we supply binary
packages to our users we should be sure that we are using either
 - official public repos like mavencentral, that contains the jars deployed by
   the official development teams of those components; or
 - ASF Infra repos where all the artifacts are controlled and a responsibility
   of a particular project's PMC

Does it make sense?
  Cos

On Thu, Sep 10, 2015 at 02:15PM, Ryan Blue wrote:
> Hi everyone,
> 
> Sorry about the confusion here, hopefully a little more info about
> the Kite project will help. Kite is intended to work across any
> Hadoop distribution and we've structured the libraries to depend on
> the upstream Apache versions by default. But, we also want CI to
> tell us if anything breaks downstream, so we allow vendor-specific
> parts. That's why we have dependency aggregators named "default"
> (upstream Apache), "cdh5", etc.
> 
> You can also see this at work in the kite-tools module, where we
> have a generic runtime that tries to construct the right classpath
> to run on any distribution (we've seen people running it on MapR and
> HDP). The likely culprit in this situation, though, is our
> kite-tools-cdh5 module that bundles CDH5 dependencies so people can
> use it their local machine that doesn't have Hadoop installed.
> 
> I sympathize with the view that we shouldn't depend on proprietary
> infra, but I think we have good reasons for catching bugs early
> (testing against vendors) and making the CLI run on non-Hadoop
> machines.
> 
> To avoid this issue, I suggest excluding the vendor-specific modules
> from the build. That should be easy to do by using the -pl and -amr
> maven command options. The -pl option allows you to supply a list of
> the modules to build and -amr ensures the dependency modules are
> present. By running with "-pl kite-tools-runtime -amr" you should be
> able to avoid hitting vendor repos.
> 
> If we need upstream changes, we can make that happen too. I hope that helps!
> 
> rb
> 
> (By the way, I'm not subscribed to dev@bigtop so cc me to keep me in
> the discussion.)
> 
> On 09/10/2015 01:44 PM, Mark Grover wrote:
> >Hi all,
> >I am not quite sure I completely understand the issue being discussed
> >here. Is the issue that there are some CDH5 dependencies being bundled
> >in Kite build? If so, I am adding Ryan Blue one of the contributors to
> >Kite to share some thoughts on it.
> >
> >If not, please let me know how I can help.
> >
> >Here are my thoughts on a few other things being discussed:
> >>Are you saying these dependencies are
> >'compile-time' only?
> >Actually, they are not. Many projects, Apache Flume, for example, use
> >Kite and those jars likely end up in the packages.
> >
> > >even hue is downloading cloudera snapshots (sic!) from maven to
> >compile against. IIRC these binaries are not bundled with "our" packaging.
> >I think you are correct here. However, we also ship LinkedIn's DataFu,
> >(Yahoo's) YCSB, Amplab's Tachyon. I don't think it's productive use of
> >anyone's time to go searching for where all these dependencies come
> >from. I personally like to think from a Bigtop community perspective.
> >What tools does an average Bigtop user want to use from the Hadoop
> >ecosystem? And, if someone in the Bigtop community is willing to
> >contribute that tool to the project, that's great! (and the license
> >being compatible/ ASL v2).
> >
> >Mark
> >
> >
> >On Thu, Sep 10, 2015 at 1:02 PM, Konstantin Boudnik <[email protected]
> ><mailto:[email protected]>> wrote:
> >
> >    On Thu, Sep 10, 2015 at 01:20PM, RJ Nowling wrote:
> >    > I think that was the second part of my statement.  :)  I don't see that
> >    > happening since neither Hue nor Kite are Apache projects and have no
> >    > incentive to distance themselves from particular vendors.
> >
> >    It isn't so much of distancing away from anything. It's more like a
> >    common
> >    sense of not using a proprietary infra (it's an implementation
> >    detail, as you
> >    should be able to plug this in via settings.xml file in your private
> >    environment) for commonly available artifacts. If I can put it
> >    bluntly - learn
> >    your tools before opening your stuff to others ;)
> >
> >    Cos
> >
> >     > On Thu, Sep 10, 2015 at 1:18 PM, Konstantin Boudnik
> >    <[email protected] <mailto:[email protected]>> wrote:
> >     >
> >     > > Or trying to convince the upstream projects to stop using their
> >    mirrors for
> >     > > what they call "open source" projects?
> >     > >
> >     > > Cos
> >     > >
> >     > > On Thu, Sep 10, 2015 at 12:05PM, RJ Nowling wrote:
> >     > > > I don't know how we'd get around it without patching the
> >    upstreams'
> >     > > > dependencies or convincing the upstream projects to use the
> >    Apache repos
> >     > > >
> >     > > > On Thu, Sep 10, 2015 at 11:46 AM, Konstantin Boudnik
> >    <[email protected] <mailto:[email protected]>>
> >     > > wrote:
> >     > > >
> >     > > > > On Thu, Sep 10, 2015 at 11:06AM, Olaf Flebbe wrote:
> >     > > > >
> >     > > > > > even hue is downloading cloudera snapshots (sic!) from
> >    maven to
> >     > > compile
> >     > > > > > against. IIRC these binaries are not bundled with "our"
> >    packaging.
> >     > > > >
> >     > > > > Yeah, Hue is another of those. Are you saying these
> >    dependencies are
> >     > > > > 'compile-time' only?  If we aren't bundling anything of
> >    these 3rd party
> >     > > > > binaries to our convenience packages - I am ok. Still feels
> >    icky though
> >     > > > >
> >     > > > > Thanks.
> >     > > > >   Cos
> >     > > > >
> >     > > > > > > Am 10.09.2015 um 03:18 schrieb Konstantin Boudnik
> >    <[email protected] <mailto:[email protected]>
> >     > > >:
> >     > > > > > >
> >     > > > > > > I have been running the full build to validate the DSL
> >    patch and
> >     > > have
> >     > > > > noticed
> >     > > > > > > that kite downloads an enormous amount of Apache stuff from
> >     > > Cloudera's
> >     > > > > repo.
> >     > > > > > > While is bad by itself, as we have no clue what's in
> >    there, I don't
> >     > > > > understand
> >     > > > > > > why we have to bring things like httpcomponents, ant,
> >    etc from a
> >     > > 3rd
> >     > > > > party
> >     > > > > > > repo-server. That's seems quite bad to me.
> >     > > > > > >
> >     > > > > > > Also, looking at it I see that the build creates things
> >    like
> >     > > > > > >    [INFO] Building Kite Hadoop CDH5 Dependencies Module
> >    1.1.0
> >     > > > > > >
> >     > > > > > > which creates a bad impression that Apache Bigtop is
> >    providing a
> >     > > > > commercial's
> >     > > > > > > vendors binaries. Can anyone who has the knowledge
> >    about this
> >     > > component
> >     > > > > > > address these issues somehow?
> >     > > > > > >
> >     > > > > > > Thank you very much!
> >     > > > > > >  Cos
> >     > > > > > >
> >     > > > > >
> >     > > > >
> >     > > > >
> >     > > > >
> >     > >
> >
> >
> 
> 
> -- 
> Ryan Blue

Reply via email to