Re: about spark assembly jar

scwf Tue, 02 Sep 2014 18:57:52 -0700

Yea, SSD + SPARK_PREPEND_CLASSES is great for iterative development!


Then why it is ok with a bag of 3rd jars but throw error with assembly jar, any 
one have idea?

On 2014/9/3 2:57, Cheng Lian wrote:

Cool, didn't notice that, thanks Josh!


On Tue, Sep 2, 2014 at 11:55 AM, Josh Rosen <rosenvi...@gmail.com 
<mailto:rosenvi...@gmail.com>> wrote:

    SPARK_PREPEND_CLASSES is documented on the Spark Wiki (which could probably 
be easier to find): 
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools


    On September 2, 2014 at 11:53:49 AM, Cheng Lian (lian.cs....@gmail.com 
<mailto:lian.cs....@gmail.com>) wrote:

    Yea, SSD + SPARK_PREPEND_CLASSES totally changed my life :)

    Maybe we should add a "developer notes" page to document all these useful
    black magic.


    On Tue, Sep 2, 2014 at 10:54 AM, Reynold Xin <r...@databricks.com 
<mailto:r...@databricks.com>> wrote:

    > Having a SSD help tremendously with assembly time.
    >
    > Without that, you can do the following in order for Spark to pick up the
    > compiled classes before assembly at runtime.
    >
    > export SPARK_PREPEND_CLASSES=true
    >
    >
    > On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza <sandy.r...@cloudera.com 
<mailto:sandy.r...@cloudera.com>>
    > wrote:
    >
    > > This doesn't help for every dependency, but Spark provides an option to
    > > build the assembly jar without Hadoop and its dependencies.  We make use
    > of
    > > this in CDH packaging.
    > >
    > > -Sandy
    > >
    > >
    > > On Tue, Sep 2, 2014 at 2:12 AM, scwf <wangf...@huawei.com 
<mailto:wangf...@huawei.com>> wrote:
    > >
    > > > Hi sean owen,
    > > > here are some problems when i used assembly jar
    > > > 1 i put spark-assembly-*.jar to the lib directory of my application, 
it
    > > > throw compile error
    > > >
    > > > Error:scalac: Error: class scala.reflect.BeanInfo not found.
    > > > scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo
    > not
    > > > found.
    > > >
    > > >         at scala.tools.nsc.symtab.Definitions$definitions$.
    > > > getModuleOrClass(Definitions.scala:655)
    > > >
    > > >         at scala.tools.nsc.symtab.Definitions$definitions$.
    > > > getClass(Definitions.scala:608)
    > > >
    > > >         at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator.<
    > > > init>(GenJVM.scala:127)
    > > >
    > > >         at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM.
    > > > scala:85)
    > > >
    > > >         at scala.tools.nsc.Global$Run.compileSources(Global.scala:953)
    > > >
    > > >         at scala.tools.nsc.Global$Run.compile(Global.scala:1041)
    > > >
    > > >         at xsbt.CachedCompiler0.run(CompilerInterface.scala:126)
    > > >
    > > >         at
    > > xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102)
    > > >
    > > >         at xsbt.CachedCompiler0.run(CompilerInterface.scala:102)
    > > >
    > > >         at xsbt.CompilerInterface.run(CompilerInterface.scala:27)
    > > >
    > > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    > > >
    > > >         at sun.reflect.NativeMethodAccessorImpl.invoke(
    > > > NativeMethodAccessorImpl.java:39)
    > > >
    > > >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
    > > > DelegatingMethodAccessorImpl.java:25)
    > > >
    > > >         at java.lang.reflect.Method.invoke(Method.java:597)
    > > >
    > > >         at sbt.compiler.AnalyzingCompiler.call(
    > > > AnalyzingCompiler.scala:102)
    > > >
    > > >         at sbt.compiler.AnalyzingCompiler.compile(
    > > > AnalyzingCompiler.scala:48)
    > > >
    > > >         at sbt.compiler.AnalyzingCompiler.compile(
    > > > AnalyzingCompiler.scala:41)
    > > >
    > > >         at org.jetbrains.jps.incremental.scala.local.
    > > > IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28)
    > > >
    > > >         at org.jetbrains.jps.incremental.scala.local.LocalServer.
    > > > compile(LocalServer.scala:25)
    > > >
    > > >         at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.
    > > > scala:58)
    > > >
    > > >         at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(
    > > > Main.scala:21)
    > > >
    > > >         at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(
    > > > Main.scala)
    > > >
    > > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    > > >
    > > >         at sun.reflect.NativeMethodAccessorImpl.invoke(
    > > > NativeMethodAccessorImpl.java:39)
    > > >
    > > >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
    > > > DelegatingMethodAccessorImpl.java:25)
    > > >
    > > >         at java.lang.reflect.Method.invoke(Method.java:597)
    > > >
    > > >         at
    > com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
    > > > 2 i test my branch which updated hive version to org.apache.hive 
0.13.1
    > > >   it run successfully when use a bag of 3rd jars as dependency but
    > throw
    > > > error using assembly jar, it seems assembly jar lead to conflict
    > > >   ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo
    > > >         at org.apache.hadoop.hive.ql.io.parquet.serde.
    > > > ArrayWritableObjectInspector.getObjectInspector(
    > > > ArrayWritableObjectInspector.java:66)
    > > >         at org.apache.hadoop.hive.ql.io.parquet.serde.
    > > >
    > ArrayWritableObjectInspector.<init>(ArrayWritableObjectInspector.java:59)
    > > >         at org.apache.hadoop.hive.ql.io.parquet.serde.
    > > > ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113)
    > > >         at org.apache.hadoop.hive.metastore.MetaStoreUtils.
    > > > getDeserializer(MetaStoreUtils.java:339)
    > > >         at org.apache.hadoop.hive.ql.metadata.Table.
    > > > getDeserializerFromMetaStore(Table.java:283)
    > > >         at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(
    > > > Table.java:189)
    > > >         at org.apache.hadoop.hive.ql.metadata.Hive.createTable(
    > > > Hive.java:597)
    > > >         at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(
    > > > DDLTask.java:4194)
    > > >         at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.
    > > > java:281)
    > > >         at
    > org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
    > > >         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(
    > > > TaskRunner.java:85)
    > > >
    > > >
    > > >
    > > >
    > > >
    > > > On 2014/9/2 16:45, Sean Owen wrote:
    > > >
    > > >> Hm, are you suggesting that the Spark distribution be a bag of 100
    > > >> JARs? It doesn't quite seem reasonable. It does not remove version
    > > >> conflicts, just pushes them to run-time, which isn't good. The
    > > >> assembly is also necessary because that's where shading happens. In
    > > >> development, you want to run against exactly what will be used in a
    > > >> real Spark distro.
    > > >>
    > > >> On Tue, Sep 2, 2014 at 9:39 AM, scwf <wangf...@huawei.com 
<mailto:wangf...@huawei.com>> wrote:
    > > >>
    > > >>> hi, all
    > > >>>    I suggest spark not use assembly jar as default run-time
    > > >>> dependency(spark-submit/spark-class depend on assembly jar),use a
    > > >>> library of
    > > >>> all 3rd dependency jar like hadoop/hive/hbase more reasonable.
    > > >>>
    > > >>>    1 assembly jar packaged all 3rd jars into a big one, so we need
    > > >>> rebuild
    > > >>> this jar if we want to update the version of some component(such as
    > > >>> hadoop)
    > > >>>    2 in our practice with spark, sometimes we meet jar compatibility
    > > >>> issue,
    > > >>> it is hard to diagnose compatibility issue with assembly jar
    > > >>>
    > > >>>
    > > >>>
    > > >>>
    > > >>>
    > > >>>
    > > >>>
    > > >>> 
---------------------------------------------------------------------
    > > >>> To unsubscribe, e-mail:dev-unsubscr...@spark.apache.org 
<mailto:dev-unsubscr...@spark.apache.org>
    > > >>> For additional commands, e-mail:dev-h...@spark.apache.org 
<mailto:dev-h...@spark.apache.org>
    > > >>>
    > > >>>
    > > >>
    > > >>
    > > >
    > > >
    > > > ---------------------------------------------------------------------
    > > > To unsubscribe, e-mail:dev-unsubscr...@spark.apache.org 
<mailto:dev-unsubscr...@spark.apache.org>
    > > > For additional commands, e-mail:dev-h...@spark.apache.org 
<mailto:dev-h...@spark.apache.org>
    > > >
    > > >
    > >
    >




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: about spark assembly jar

Reply via email to