Herman:
For "Pre-built with user-provided Hadoop", spark-1.4.1-bin-hadoop2.6.tgz,
e.g., uses hadoop-2.6 profile which defines versions of projects Spark
depends on.

Hadoop cluster is used to provide storage (hdfs) and resource management
(YARN).
For the latter, please see:
https://spark.apache.org/docs/latest/running-on-yarn.html

Cheers

On Thu, Jul 30, 2015 at 1:48 AM, hermansc <herman.schis...@gmail.com> wrote:

> Hi.
>
> I want to run Spark, and more specifically the "Pre-build with
> user-provided
> Hadoop" version from the downloads page, but I can't find any documentation
> on how to connect the two components together (namely Spark and Hadoop).
>
> I've had some success in settting SPARK_CLASSPATH to my hadoop distribution
> lib/ directory, containing jar files such as hadoop-core, hadoop-common
> etc.
>
> However, there seems to be many native libraries included in the assembly
> jar for Spark versions pre-built for Hadoop distributions (I'm specifically
> missing the libsnappy.so files) that are not by default included in
> distributions such as Cloudera Hadoop.
>
> Have anyone here actually tried to run Spark without Hadoop included in the
> assembly jar and/or have any more resources where I can read about the
> proper way of connecting them?
>
> As an aside, the spark-assembly jar in the Spark version pre-built for
> user-provided Hadoop distributions is named
> spark-assembly-1.4.0-hadoop2.2.0.jar, which doesn't make sense - it should
> be called spark-assembly-1.4.0-without-hadoop.jar :)
>
> --
> Herman
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-user-provided-Hadoop-installation-tp24076.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to