Herman: For "Pre-built with user-provided Hadoop", spark-1.4.1-bin-hadoop2.6.tgz, e.g., uses hadoop-2.6 profile which defines versions of projects Spark depends on.
Hadoop cluster is used to provide storage (hdfs) and resource management (YARN). For the latter, please see: https://spark.apache.org/docs/latest/running-on-yarn.html Cheers On Thu, Jul 30, 2015 at 1:48 AM, hermansc <herman.schis...@gmail.com> wrote: > Hi. > > I want to run Spark, and more specifically the "Pre-build with > user-provided > Hadoop" version from the downloads page, but I can't find any documentation > on how to connect the two components together (namely Spark and Hadoop). > > I've had some success in settting SPARK_CLASSPATH to my hadoop distribution > lib/ directory, containing jar files such as hadoop-core, hadoop-common > etc. > > However, there seems to be many native libraries included in the assembly > jar for Spark versions pre-built for Hadoop distributions (I'm specifically > missing the libsnappy.so files) that are not by default included in > distributions such as Cloudera Hadoop. > > Have anyone here actually tried to run Spark without Hadoop included in the > assembly jar and/or have any more resources where I can read about the > proper way of connecting them? > > As an aside, the spark-assembly jar in the Spark version pre-built for > user-provided Hadoop distributions is named > spark-assembly-1.4.0-hadoop2.2.0.jar, which doesn't make sense - it should > be called spark-assembly-1.4.0-without-hadoop.jar :) > > -- > Herman > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-user-provided-Hadoop-installation-tp24076.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >