Re: excluding hadoop dependencies in spark's assembly files

Roman Shaposhnik Mon, 06 Jan 2014 16:58:36 -0800

Alex,

I don't know if it helps or not but sometimes back I made maven assembly to be
able to package Spark in Bigtop. That assembly exclude all hadoop
dependencies. So, you can simply build it using maven, instead of sbt.


Regards,
  Cos

On Mon, Jan 06, 2014 at 02:33PM, Alex Cozzi wrote:
> I am trying to exclude the hadoop jar dependencies from spark’s assembly 
> files, the reason being that in order to work on our cluster it is necessary 
> to use our now version of those files instead of the published ones. I tried 
> define the hadoop dependencies as “provided”, but surpassingly this causes 
> compilation errors in the build. Just to be clear, I modified the sbt build 
> file 
> as follows:
> 
>   def yarnEnabledSettings = Seq(
>     libraryDependencies ++= Seq(
>       // Exclude rule required for all ?
>       "org.apache.hadoop" % "hadoop-client" % hadoopVersion  % "provided" 
> excludeAll(excludeJackson, excludeNetty, excludeAsm, excludeCglib),
>       "org.apache.hadoop" % "hadoop-yarn-api" % hadoopVersion  % "provided" 
> excludeAll(excludeJackson, excludeNetty, excludeAsm, excludeCglib),
>       "org.apache.hadoop" % "hadoop-yarn-common" % hadoopVersion  % 
> "provided" excludeAll(excludeJackson, excludeNetty, excludeAsm, excludeCglib),
>       "org.apache.hadoop" % "hadoop-yarn-client" % hadoopVersion  % 
> "provided" excludeAll(excludeJackson, excludeNetty, excludeAsm, excludeCglib)
>     )
>   )
> 
> and compile as 
> 
>  SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true SPARK_IS_NEW_HADOOP=true sbt  
> assembly
> 
> 
> but the assembly still includes the hadoop libraries, contrary to what the 
> assembly docs say. I managed to exclude them instead by using the 
> non-recommended way:
> def extraAssemblySettings() = Seq(
>     test in assembly := {},
>     mergeStrategy in assembly := {
>       case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
>       case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => 
> MergeStrategy.discard
>       case "log4j.properties" => MergeStrategy.discard
>       case m if m.toLowerCase.startsWith("meta-inf/services/") => 
> MergeStrategy.filterDistinctLines
>       case "reference.conf" => MergeStrategy.concat
>       case _ => MergeStrategy.first
>     },
>     excludedJars in assembly <<= (fullClasspath in assembly) map { cp => 
>      cp filter {_.data.getName.contains("hadoop")}
>     }
> )
> 
> 
> But I would like to hear whether there is interest in excluding the hadoop 
> jar by default in the build
> Alex Cozzi
> alexco...@gmail.com

Re: excluding hadoop dependencies in spark's assembly files

Reply via email to