Agree with Robert, ASF only releases source code. So the binary packages is just convenience from Flink that targeted specific Hadoop vendors.
If you look at Apache Spark download page [1], they also do the same thing by providing distro specific binaries. AFAIK this should NOT be a problem and especially should not block the release. Thanks, Henry [1] http://spark.apache.org/downloads.html On Fri, Aug 15, 2014 at 11:28 AM, Robert Metzger <[email protected]> wrote: > Hi, > > I'm glad you've brought this topic up. (Thank you also for checking the > release!). > I've used Spark's release script as a reference for creating ours (why > reinventing the wheel, they have excellent infrastructure), and they had a > CDH4 profile, so I thought its okay for Apache projects to have these > special builds. > > Let me explain the technical background for this: (I hope all information > here is correct, correct me if I'm wrong) > There are two components inside Flink that have dependencies to Hadoop a) > HDFS and b) YARN. > > Usually, users who have a Hadoop versions like 0.2x or 1.x can use our > "hadoop1" builds. They contain the hadoop1 HDFS client and no YARN support. > Users with old CDH versions (I guess pre 4), Hortonworks or MapR can also > use these builds. > For users that have newer vendor distributions (HDP2, CDH5, ...) can use > our "hadoop2" build. It contains the newer HDFS client (protobuf-based RPC) > and have support for the new YARN API (2.2.0 onwards). > So the "hadoop1" and "hadoop2" builds cover probably most of the cases > users have. > Then, there is CDH4, which contains a "unreleased" Hadoop 2.0.0 version. It > has the new HDFS client (protobuf), but the old YARN API (2.1.0-beta or > so), which we don't support. Therefore, users can not use the "hadoop1" > build (wrong HDFS client) and the "hadoop2" build is not compatible with > YARN. > > If you have a look at the Spark downloads page, you'll find the following > (apache-hosted?) binary builds: > > > > - For Hadoop 1 (HDP1, CDH3): find an Apache mirror > > <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop1.tgz> > or direct file download > <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop1.tgz> > - For CDH4: find an Apache mirror > > <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-cdh4.tgz> > or direct file download > <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-cdh4.tgz> > - For Hadoop 2 (HDP2, CDH5): find an Apache mirror > > <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop2.tgz> > or direct file download > <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop2.tgz> > > > I think this choice of binaries reflects what I've explained above. > > I'm happy (if the others agree) to remove the cdh4 binary from the release > and delay the discussion after the release. > > Best, > Robert > > > > > On Fri, Aug 15, 2014 at 8:01 PM, Owen O'Malley <[email protected]> wrote: > >> As a mentor, I agree that vendor specific packages aren't appropriate for >> the Apache site. (Disclosure: I work at Hortonworks.) Working with the >> vendors to make packages available is great, but they shouldn't be hosted >> at Apache. >> >> .. Owen >> >> >> On Fri, Aug 15, 2014 at 10:32 AM, Sean Owen <[email protected]> wrote: >> >> > I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I >> > have for example lobbied Spark to remove CDH-specific releases and >> > build profiles. Not just for this reason, but because it is often >> > unnecessary to have vendor-specific builds, and also just increases >> > maintenance overhead for the project. >> > >> > Matei et al say they want to make it as easy as possible to consume >> > Spark, and so provide vendor-build-specific artifacts and such here >> > and there. To be fair, Spark tries to support a large range of Hadoop >> > and YARN versions, and getting the right combination of profiles and >> > versions right to recreate a vendor release was kind of hard until >> > about Hadoop 2.2 (stable YARN really). >> > >> > I haven't heard of any formal policy. I would ask whether there are >> > similar reasons to produce pre-packaged releases like so? >> > >> > >> > On Fri, Aug 15, 2014 at 6:24 PM, Alan Gates <[email protected]> >> wrote: >> > > Let me begin by noting that I obviously have a conflict of interest >> > since my >> > > company is a direct competitor to Cloudera. But as a mentor and Apache >> > > member I believe I need to bring this up. >> > > >> > > What is the Apache policy towards having a vendor specific package on a >> > > download site? It is strange to me to come to Flink's website and see >> > > packages for Flink with CDH (or HDP or MapR or whatever). We should >> > avoid >> > > providing vendor specific packages. It gives the appearance of >> > preferring >> > > one vendor over another, which Apache does not want to do. >> > > >> > > I have no problem at all with Cloudera hosting a CDH specific package >> of >> > > Flink, nor with Flink project members working with Cloudera to create >> > such a >> > > package. But I do not think they should be hosted at Apache. >> > > >> > > Alan. >> > > -- >> > > Sent with Postbox <http://www.getpostbox.com> >> > > >> > > -- >> > > CONFIDENTIALITY NOTICE >> > > NOTICE: This message is intended for the use of the individual or >> entity >> > to >> > > which it is addressed and may contain information that is confidential, >> > > privileged and exempt from disclosure under applicable law. If the >> > reader of >> > > this message is not the intended recipient, you are hereby notified >> that >> > any >> > > printing, copying, dissemination, distribution, disclosure or >> forwarding >> > of >> > > this communication is strictly prohibited. If you have received this >> > > communication in error, please contact the sender immediately and >> delete >> > it >> > > from your system. Thank You. >> > >>
