Re: Question on providing CDH packages

Robert Metzger Fri, 15 Aug 2014 11:29:54 -0700

Hi,

I'm glad you've brought this topic up. (Thank you also for checking the
release!).
I've used Spark's release script as a reference for creating ours (why
reinventing the wheel, they have excellent infrastructure), and they had a
CDH4 profile, so I thought its okay for Apache projects to have these
special builds.


Let me explain the technical background for this: (I hope all information
here is correct, correct me if I'm wrong)
There are two components inside Flink that have dependencies to Hadoop a)
HDFS and b) YARN.

Usually, users who have a Hadoop versions like 0.2x or 1.x can use our
"hadoop1" builds. They contain the hadoop1 HDFS client and no YARN support.
Users with old CDH versions (I guess pre 4), Hortonworks or MapR can also
use these builds.
For users that have newer vendor distributions (HDP2, CDH5, ...) can use
our "hadoop2" build. It contains the newer HDFS client (protobuf-based RPC)
and have support for the new YARN API (2.2.0 onwards).
So the "hadoop1" and "hadoop2" builds cover probably most of the cases
users have.
Then, there is CDH4, which contains a "unreleased" Hadoop 2.0.0 version. It
has the new HDFS client (protobuf), but the old YARN API (2.1.0-beta or
so), which we don't support. Therefore, users can not use the "hadoop1"
build (wrong HDFS client) and the "hadoop2" build is not compatible with
YARN.

If you have a look at the Spark downloads page, you'll find the following
(apache-hosted?) binary builds:



   - For Hadoop 1 (HDP1, CDH3): find an Apache mirror
   
<http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop1.tgz>
   or direct file download
   <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop1.tgz>
   - For CDH4: find an Apache mirror
   
<http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-cdh4.tgz>
   or direct file download
   <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-cdh4.tgz>
   - For Hadoop 2 (HDP2, CDH5): find an Apache mirror
   
<http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop2.tgz>
   or direct file download
   <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop2.tgz>


I think this choice of binaries reflects what I've explained above.

I'm happy (if the others agree) to remove the cdh4 binary from the release
and delay the discussion after the release.

Best,
Robert




On Fri, Aug 15, 2014 at 8:01 PM, Owen O'Malley <[email protected]> wrote:

> As a mentor, I agree that vendor specific packages aren't appropriate for
> the Apache site. (Disclosure: I work at Hortonworks.) Working with the
> vendors to make packages available is great, but they shouldn't be hosted
> at Apache.
>
> .. Owen
>
>
> On Fri, Aug 15, 2014 at 10:32 AM, Sean Owen <[email protected]> wrote:
>
> > I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I
> > have for example lobbied Spark to remove CDH-specific releases and
> > build profiles. Not just for this reason, but because it is often
> > unnecessary to have vendor-specific builds, and also just increases
> > maintenance overhead for the project.
> >
> > Matei et al say they want to make it as easy as possible to consume
> > Spark, and so provide vendor-build-specific artifacts and such here
> > and there. To be fair, Spark tries to support a large range of Hadoop
> > and YARN versions, and getting the right combination of profiles and
> > versions right to recreate a vendor release was kind of hard until
> > about Hadoop 2.2 (stable YARN really).
> >
> > I haven't heard of any formal policy. I would ask whether there are
> > similar reasons to produce pre-packaged releases like so?
> >
> >
> > On Fri, Aug 15, 2014 at 6:24 PM, Alan Gates <[email protected]>
> wrote:
> > > Let me begin by noting that I obviously have a conflict of interest
> > since my
> > > company is a direct competitor to Cloudera.  But as a mentor and Apache
> > > member I believe I need to bring this up.
> > >
> > > What is the Apache policy towards having a vendor specific package on a
> > > download site?  It is strange to me to come to Flink's website and see
> > > packages for Flink with CDH (or HDP or MapR or whatever).  We should
> > avoid
> > > providing vendor specific packages.  It gives the appearance of
> > preferring
> > > one vendor over another, which Apache does not want to do.
> > >
> > > I have no problem at all with Cloudera hosting a CDH specific package
> of
> > > Flink, nor with Flink project members working with Cloudera to create
> > such a
> > > package.  But I do not think they should be hosted at Apache.
> > >
> > > Alan.
> > > --
> > > Sent with Postbox <http://www.getpostbox.com>
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> > reader of
> > > this message is not the intended recipient, you are hereby notified
> that
> > any
> > > printing, copying, dissemination, distribution, disclosure or
> forwarding
> > of
> > > this communication is strictly prohibited. If you have received this
> > > communication in error, please contact the sender immediately and
> delete
> > it
> > > from your system. Thank You.
> >
>

Re: Question on providing CDH packages

Reply via email to