Agree with Robert,

ASF only releases source code. So the binary packages is just
convenience from Flink that targeted specific Hadoop vendors.

If you look at Apache Spark download page [1], they also do the same
thing by providing distro specific binaries.

AFAIK this should NOT be a problem and especially should not block the release.

Thanks,

Henry

[1] http://spark.apache.org/downloads.html

On Fri, Aug 15, 2014 at 11:28 AM, Robert Metzger <[email protected]> wrote:
> Hi,
>
> I'm glad you've brought this topic up. (Thank you also for checking the
> release!).
> I've used Spark's release script as a reference for creating ours (why
> reinventing the wheel, they have excellent infrastructure), and they had a
> CDH4 profile, so I thought its okay for Apache projects to have these
> special builds.
>
> Let me explain the technical background for this: (I hope all information
> here is correct, correct me if I'm wrong)
> There are two components inside Flink that have dependencies to Hadoop a)
> HDFS and b) YARN.
>
> Usually, users who have a Hadoop versions like 0.2x or 1.x can use our
> "hadoop1" builds. They contain the hadoop1 HDFS client and no YARN support.
> Users with old CDH versions (I guess pre 4), Hortonworks or MapR can also
> use these builds.
> For users that have newer vendor distributions (HDP2, CDH5, ...) can use
> our "hadoop2" build. It contains the newer HDFS client (protobuf-based RPC)
> and have support for the new YARN API (2.2.0 onwards).
> So the "hadoop1" and "hadoop2" builds cover probably most of the cases
> users have.
> Then, there is CDH4, which contains a "unreleased" Hadoop 2.0.0 version. It
> has the new HDFS client (protobuf), but the old YARN API (2.1.0-beta or
> so), which we don't support. Therefore, users can not use the "hadoop1"
> build (wrong HDFS client) and the "hadoop2" build is not compatible with
> YARN.
>
> If you have a look at the Spark downloads page, you'll find the following
> (apache-hosted?) binary builds:
>
>
>
>    - For Hadoop 1 (HDP1, CDH3): find an Apache mirror
>    
> <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop1.tgz>
>    or direct file download
>    <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop1.tgz>
>    - For CDH4: find an Apache mirror
>    
> <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-cdh4.tgz>
>    or direct file download
>    <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-cdh4.tgz>
>    - For Hadoop 2 (HDP2, CDH5): find an Apache mirror
>    
> <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop2.tgz>
>    or direct file download
>    <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop2.tgz>
>
>
> I think this choice of binaries reflects what I've explained above.
>
> I'm happy (if the others agree) to remove the cdh4 binary from the release
> and delay the discussion after the release.
>
> Best,
> Robert
>
>
>
>
> On Fri, Aug 15, 2014 at 8:01 PM, Owen O'Malley <[email protected]> wrote:
>
>> As a mentor, I agree that vendor specific packages aren't appropriate for
>> the Apache site. (Disclosure: I work at Hortonworks.) Working with the
>> vendors to make packages available is great, but they shouldn't be hosted
>> at Apache.
>>
>> .. Owen
>>
>>
>> On Fri, Aug 15, 2014 at 10:32 AM, Sean Owen <[email protected]> wrote:
>>
>> > I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I
>> > have for example lobbied Spark to remove CDH-specific releases and
>> > build profiles. Not just for this reason, but because it is often
>> > unnecessary to have vendor-specific builds, and also just increases
>> > maintenance overhead for the project.
>> >
>> > Matei et al say they want to make it as easy as possible to consume
>> > Spark, and so provide vendor-build-specific artifacts and such here
>> > and there. To be fair, Spark tries to support a large range of Hadoop
>> > and YARN versions, and getting the right combination of profiles and
>> > versions right to recreate a vendor release was kind of hard until
>> > about Hadoop 2.2 (stable YARN really).
>> >
>> > I haven't heard of any formal policy. I would ask whether there are
>> > similar reasons to produce pre-packaged releases like so?
>> >
>> >
>> > On Fri, Aug 15, 2014 at 6:24 PM, Alan Gates <[email protected]>
>> wrote:
>> > > Let me begin by noting that I obviously have a conflict of interest
>> > since my
>> > > company is a direct competitor to Cloudera.  But as a mentor and Apache
>> > > member I believe I need to bring this up.
>> > >
>> > > What is the Apache policy towards having a vendor specific package on a
>> > > download site?  It is strange to me to come to Flink's website and see
>> > > packages for Flink with CDH (or HDP or MapR or whatever).  We should
>> > avoid
>> > > providing vendor specific packages.  It gives the appearance of
>> > preferring
>> > > one vendor over another, which Apache does not want to do.
>> > >
>> > > I have no problem at all with Cloudera hosting a CDH specific package
>> of
>> > > Flink, nor with Flink project members working with Cloudera to create
>> > such a
>> > > package.  But I do not think they should be hosted at Apache.
>> > >
>> > > Alan.
>> > > --
>> > > Sent with Postbox <http://www.getpostbox.com>
>> > >
>> > > --
>> > > CONFIDENTIALITY NOTICE
>> > > NOTICE: This message is intended for the use of the individual or
>> entity
>> > to
>> > > which it is addressed and may contain information that is confidential,
>> > > privileged and exempt from disclosure under applicable law. If the
>> > reader of
>> > > this message is not the intended recipient, you are hereby notified
>> that
>> > any
>> > > printing, copying, dissemination, distribution, disclosure or
>> forwarding
>> > of
>> > > this communication is strictly prohibited. If you have received this
>> > > communication in error, please contact the sender immediately and
>> delete
>> > it
>> > > from your system. Thank You.
>> >
>>

Reply via email to