Re: about spark assembly jar

Reynold Xin Tue, 02 Sep 2014 10:55:25 -0700

Having a SSD help tremendously with assembly time.

Without that, you can do the following in order for Spark to pick up the
compiled classes before assembly at runtime.


export SPARK_PREPEND_CLASSES=true


On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza <sandy.r...@cloudera.com> wrote:

> This doesn't help for every dependency, but Spark provides an option to
> build the assembly jar without Hadoop and its dependencies.  We make use of
> this in CDH packaging.
>
> -Sandy
>
>
> On Tue, Sep 2, 2014 at 2:12 AM, scwf <wangf...@huawei.com> wrote:
>
> > Hi sean owen,
> > here are some problems when i used assembly jar
> > 1 i put spark-assembly-*.jar to the lib directory of my application, it
> > throw compile error
> >
> > Error:scalac: Error: class scala.reflect.BeanInfo not found.
> > scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo not
> > found.
> >
> >         at scala.tools.nsc.symtab.Definitions$definitions$.
> > getModuleOrClass(Definitions.scala:655)
> >
> >         at scala.tools.nsc.symtab.Definitions$definitions$.
> > getClass(Definitions.scala:608)
> >
> >         at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator.<
> > init>(GenJVM.scala:127)
> >
> >         at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM.
> > scala:85)
> >
> >         at scala.tools.nsc.Global$Run.compileSources(Global.scala:953)
> >
> >         at scala.tools.nsc.Global$Run.compile(Global.scala:1041)
> >
> >         at xsbt.CachedCompiler0.run(CompilerInterface.scala:126)
> >
> >         at
> xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102)
> >
> >         at xsbt.CachedCompiler0.run(CompilerInterface.scala:102)
> >
> >         at xsbt.CompilerInterface.run(CompilerInterface.scala:27)
> >
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> >         at sun.reflect.NativeMethodAccessorImpl.invoke(
> > NativeMethodAccessorImpl.java:39)
> >
> >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > DelegatingMethodAccessorImpl.java:25)
> >
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >
> >         at sbt.compiler.AnalyzingCompiler.call(
> > AnalyzingCompiler.scala:102)
> >
> >         at sbt.compiler.AnalyzingCompiler.compile(
> > AnalyzingCompiler.scala:48)
> >
> >         at sbt.compiler.AnalyzingCompiler.compile(
> > AnalyzingCompiler.scala:41)
> >
> >         at org.jetbrains.jps.incremental.scala.local.
> > IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28)
> >
> >         at org.jetbrains.jps.incremental.scala.local.LocalServer.
> > compile(LocalServer.scala:25)
> >
> >         at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.
> > scala:58)
> >
> >         at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(
> > Main.scala:21)
> >
> >         at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(
> > Main.scala)
> >
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> >         at sun.reflect.NativeMethodAccessorImpl.invoke(
> > NativeMethodAccessorImpl.java:39)
> >
> >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > DelegatingMethodAccessorImpl.java:25)
> >
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >
> >         at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
> > 2 i test my branch which updated hive version to org.apache.hive 0.13.1
> >   it run successfully when use a bag of 3rd jars as dependency but throw
> > error using assembly jar, it seems assembly jar lead to conflict
> >   ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo
> >         at org.apache.hadoop.hive.ql.io.parquet.serde.
> > ArrayWritableObjectInspector.getObjectInspector(
> > ArrayWritableObjectInspector.java:66)
> >         at org.apache.hadoop.hive.ql.io.parquet.serde.
> > ArrayWritableObjectInspector.<init>(ArrayWritableObjectInspector.java:59)
> >         at org.apache.hadoop.hive.ql.io.parquet.serde.
> > ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113)
> >         at org.apache.hadoop.hive.metastore.MetaStoreUtils.
> > getDeserializer(MetaStoreUtils.java:339)
> >         at org.apache.hadoop.hive.ql.metadata.Table.
> > getDeserializerFromMetaStore(Table.java:283)
> >         at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(
> > Table.java:189)
> >         at org.apache.hadoop.hive.ql.metadata.Hive.createTable(
> > Hive.java:597)
> >         at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(
> > DDLTask.java:4194)
> >         at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.
> > java:281)
> >         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
> >         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(
> > TaskRunner.java:85)
> >
> >
> >
> >
> >
> > On 2014/9/2 16:45, Sean Owen wrote:
> >
> >> Hm, are you suggesting that the Spark distribution be a bag of 100
> >> JARs? It doesn't quite seem reasonable. It does not remove version
> >> conflicts, just pushes them to run-time, which isn't good. The
> >> assembly is also necessary because that's where shading happens. In
> >> development, you want to run against exactly what will be used in a
> >> real Spark distro.
> >>
> >> On Tue, Sep 2, 2014 at 9:39 AM, scwf <wangf...@huawei.com> wrote:
> >>
> >>> hi, all
> >>>    I suggest spark not use assembly jar as default run-time
> >>> dependency(spark-submit/spark-class depend on assembly jar),use a
> >>> library of
> >>> all 3rd dependency jar like hadoop/hive/hbase more reasonable.
> >>>
> >>>    1 assembly jar packaged all 3rd jars into a big one, so we need
> >>> rebuild
> >>> this jar if we want to update the version of some component(such as
> >>> hadoop)
> >>>    2 in our practice with spark, sometimes we meet jar compatibility
> >>> issue,
> >>> it is hard to diagnose compatibility issue with assembly jar
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >>> For additional commands, e-mail: dev-h...@spark.apache.org
> >>>
> >>>
> >>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> > For additional commands, e-mail: dev-h...@spark.apache.org
> >
> >
>

Re: about spark assembly jar

Reply via email to