On Fri, Aug 29, 2014 at 7:42 AM, Patrick Wendell <pwend...@gmail.com> wrote:
> In terms of vendor support for this approach - In the early days
> Cloudera asked us to add CDH4 repository and more recently Pivotal and
> MapR also asked us to allow linking against their hadoop-client
> libraries. So we've added these based on direct requests from vendors.
> Given the ubiquity of the Hadoop FileSystem API, it's hard for me to
> imagine ruffling feathers by supporting this. But if we get feedback
> in that direction over time we can of course consider a different
> approach.

By this, you mean that it's easy to control the Hadoop version in the
build and set it to some other vendor-specific release? Yes that seems
ideal. Making the build flexible, and adding the repository references
to pom.xml is part of enabling that -- to me, no question that's good.

So you can always roll your own build for your cluster, if you need
to. I understand the role of the cdh4 / mapr3 / mapr4 binaries as just
a convenience.

But it's a convenience for people who...
- are installing Spark on a cluster (i.e. not an end user)
- that doesn't have it in their distro already
- whose distro isn't compatible with a plain vanilla Hadoop distro

That can't be many. CDH4.6+ is most of the installed CDH base and it
already has Spark. I thought MapR already had Spark built in. The
audience seems small enough, and the convenience relatively small
enough (is it hard to run the distribution script?) that it caused me
to ask whether it was worth bothering providing these, especially give
the possible ASF sensitivity.

I say crack on; you get my point.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to