Re: spark-submit can not resolve spark-hive_2.10
Thanks for the reply. Actually, I don't think excluding spark-hive from spark-submit --packages is a good idea. I don't want to recompile spark by assembly for my cluster, every time a new spark release is out. I prefer using binary version of spark and then adding some jars for job execution. e.g. Add spark-hive for HiveContext usage. FYI, spark-hive is just 1.2MB: http://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10/1.4.0 On Wed, Jul 8, 2015 at 2:03 AM, Burak Yavuz brk...@gmail.com wrote: spark-hive is excluded when using --packages, because it can be included in the spark-assembly by adding -Phive during mvn package or sbt assembly. Best, Burak On Tue, Jul 7, 2015 at 8:06 AM, Hao Ren inv...@gmail.com wrote: I want to add spark-hive as a dependence to submit my job, but it seems that spark-submit can not resolve it. $ ./bin/spark-submit \ → --packages org.apache.spark:spark-hive_2.10:1.4.0,org.postgresql:postgresql:9.3-1103-jdbc3,joda-time:joda-time:2.8.1 \ → --class fr.leboncoin.etl.jobs.dwh.AdStateTraceDWHTransform \ → --master spark://localhost:7077 \ Ivy Default Cache set to: /home/invkrh/.ivy2/cache The jars for the packages stored in: /home/invkrh/.ivy2/jars https://repository.jboss.org/nexus/content/repositories/releases/ added as a remote repository with the name: repo-1 :: loading settings :: url = jar:file:/home/invkrh/workspace/scala/spark/assembly/target/scala-2.10/spark-assembly-1.4.0-SNAPSHOT-hadoop2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml org.apache.spark#spark-hive_2.10 added as a dependency org.postgresql#postgresql added as a dependency joda-time#joda-time added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found org.postgresql#postgresql;9.3-1103-jdbc3 in local-m2-cache found joda-time#joda-time;2.8.1 in central :: resolution report :: resolve 139ms :: artifacts dl 3ms :: modules in use: joda-time#joda-time;2.8.1 from central in [default] org.postgresql#postgresql;9.3-1103-jdbc3 from local-m2-cache in [default] - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 2 | 0 | 0 | 0 || 2 | 0 | - :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts copied, 2 already retrieved (0kB/6ms) Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/sql/hive/HiveContext at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:633) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.HiveContext at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/07/07 16:57:59 INFO Utils: Shutdown hook called Any help is appreciated. Thank you. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-can-not-resolve-spark-hive-2-10-tp23695.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Hao Ren Data Engineer @ leboncoin Paris, France
spark-submit can not resolve spark-hive_2.10
I want to add spark-hive as a dependence to submit my job, but it seems that spark-submit can not resolve it. $ ./bin/spark-submit \ → --packages org.apache.spark:spark-hive_2.10:1.4.0,org.postgresql:postgresql:9.3-1103-jdbc3,joda-time:joda-time:2.8.1 \ → --class fr.leboncoin.etl.jobs.dwh.AdStateTraceDWHTransform \ → --master spark://localhost:7077 \ Ivy Default Cache set to: /home/invkrh/.ivy2/cache The jars for the packages stored in: /home/invkrh/.ivy2/jars https://repository.jboss.org/nexus/content/repositories/releases/ added as a remote repository with the name: repo-1 :: loading settings :: url = jar:file:/home/invkrh/workspace/scala/spark/assembly/target/scala-2.10/spark-assembly-1.4.0-SNAPSHOT-hadoop2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml org.apache.spark#spark-hive_2.10 added as a dependency org.postgresql#postgresql added as a dependency joda-time#joda-time added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found org.postgresql#postgresql;9.3-1103-jdbc3 in local-m2-cache found joda-time#joda-time;2.8.1 in central :: resolution report :: resolve 139ms :: artifacts dl 3ms :: modules in use: joda-time#joda-time;2.8.1 from central in [default] org.postgresql#postgresql;9.3-1103-jdbc3 from local-m2-cache in [default] - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 2 | 0 | 0 | 0 || 2 | 0 | - :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts copied, 2 already retrieved (0kB/6ms) Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/sql/hive/HiveContext at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:633) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.HiveContext at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/07/07 16:57:59 INFO Utils: Shutdown hook called Any help is appreciated. Thank you. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-can-not-resolve-spark-hive-2-10-tp23695.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark-submit can not resolve spark-hive_2.10
spark-hive is excluded when using --packages, because it can be included in the spark-assembly by adding -Phive during mvn package or sbt assembly. Best, Burak On Tue, Jul 7, 2015 at 8:06 AM, Hao Ren inv...@gmail.com wrote: I want to add spark-hive as a dependence to submit my job, but it seems that spark-submit can not resolve it. $ ./bin/spark-submit \ → --packages org.apache.spark:spark-hive_2.10:1.4.0,org.postgresql:postgresql:9.3-1103-jdbc3,joda-time:joda-time:2.8.1 \ → --class fr.leboncoin.etl.jobs.dwh.AdStateTraceDWHTransform \ → --master spark://localhost:7077 \ Ivy Default Cache set to: /home/invkrh/.ivy2/cache The jars for the packages stored in: /home/invkrh/.ivy2/jars https://repository.jboss.org/nexus/content/repositories/releases/ added as a remote repository with the name: repo-1 :: loading settings :: url = jar:file:/home/invkrh/workspace/scala/spark/assembly/target/scala-2.10/spark-assembly-1.4.0-SNAPSHOT-hadoop2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml org.apache.spark#spark-hive_2.10 added as a dependency org.postgresql#postgresql added as a dependency joda-time#joda-time added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found org.postgresql#postgresql;9.3-1103-jdbc3 in local-m2-cache found joda-time#joda-time;2.8.1 in central :: resolution report :: resolve 139ms :: artifacts dl 3ms :: modules in use: joda-time#joda-time;2.8.1 from central in [default] org.postgresql#postgresql;9.3-1103-jdbc3 from local-m2-cache in [default] - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 2 | 0 | 0 | 0 || 2 | 0 | - :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts copied, 2 already retrieved (0kB/6ms) Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/sql/hive/HiveContext at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:633) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.HiveContext at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/07/07 16:57:59 INFO Utils: Shutdown hook called Any help is appreciated. Thank you. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-can-not-resolve-spark-hive-2-10-tp23695.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org