Re: How to set local property in beeline connect to the spark thrift server
Hi, Xiaoyu! You can use `spark.sql.thriftserver.scheduler.pool` instead of `spark.scheduler.pool` only in spark thrift server. On Wed, Dec 31, 2014 at 3:55 PM, Xiaoyu Wang wangxy...@gmail.com wrote: Hi all! I use Spark SQL1.2 start the thrift server on yarn. I want to use fair scheduler in the thrift server. I set the properties in spark-defaults.conf like this: spark.scheduler.mode FAIR spark.scheduler.allocation.file /opt/spark-1.2.0-bin-2.4.1/conf/fairscheduler.xml In the thrift server UI can see the scheduler pool is ok. [image: 内嵌图片 1] How can I specify one sql job to the test pool. By default the sql job run in the default pool. In the http://spark.apache.org/docs/latest/job-scheduling.html document I see sc.setLocalProperty(spark.scheduler.pool, pool1) can be set in the code. In the beeline I execute set spark.scheduler.pool=test, but no use. But how can I set the local property in the beeline?
Got NotSerializableException when access broadcast variable
Hi everyone! I got a exception when i run my script with spark-shell: I added SPARK_JAVA_OPTS=-Dsun.io.serialization.extendedDebugInfo=true in spark-env.sh to show the following stack: org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) at org.apache.spark.rdd.RDD.filter(RDD.scala:282) at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460) at $iwC$$iwC$$iwC$$iwC.init(console:18) at $iwC$$iwC$$iwC.init(console:23) at $iwC$$iwC.init(console:25) at $iwC.init(console:27) at init(console:29) at .init(console:33) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) …… Caused by: java.io.NotSerializableException: org.apache.spark.sql.hive.HiveContext$$anon$3 - field (class org.apache.spark.sql.hive.HiveContext, name: functionRegistry, type: class org.apache.spark.sql.hive.HiveFunctionRegistry) - object (class org.apache.spark.sql.hive.HiveContext, org.apache.spark.sql.hive.HiveContext@4648e685) - field (class $iwC$$iwC$$iwC$$iwC, name: hc, type: class org.apache.spark.sql.hive.HiveContext) - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@23d652ef) - field (class $iwC$$iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC$$iwC) - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@71cc14f1) - field (class $iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC) - object (class $iwC$$iwC, $iwC$$iwC@74eca89e) - field (class $iwC, name: $iw, type: class $iwC$$iwC) - object (class $iwC, $iwC@685c4cc4) - field (class $line9.$read, name: $iw, type: class $iwC) - object (class $line9.$read, $line9.$read@519f9aae) - field (class $iwC$$iwC$$iwC, name: $VAL7, type: class $line9.$read) - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@4b996858) - field (class $iwC$$iwC$$iwC$$iwC, name: $outer, type: class $iwC$$iwC$$iwC) - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@31d646d4) - field (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, name: $outer, type: class $iwC$$iwC$$iwC$$iwC) - root object (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, function1) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528) I write some simple script to reproduce this problem. case 1 : val barr1 = sc.broadcast(test) val sret = sc.parallelize(1 to 10, 2) val ret = sret.filter(row = !barr1.equals(test)) ret.collect.foreach(println) It’s working fine with local mode and yarn-client mode. case 2 : val barr1 = sc.broadcast(test) val hc = new org.apache.spark.sql.hive.HiveContext(sc) val sret = hc.sql(show tables) val ret = sret.filter(row = !barr1.equals(test)) ret.collect.foreach(println) It will throw java.io.NotSerializableException: org.apache.spark.sql.hive.HiveContext with local mode and yarn-client mode But it working fine if I write the same code in a scala file and run in Intellij IDEA. import org.apache.spark.{SparkConf, SparkContext} object TestBroadcast2 { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(Broadcast Test).setMaster(local[3]) val sc = new SparkContext(sparkConf) val barr1 = sc.broadcast(test) val hc = new org.apache.spark.sql.hive.HiveContext(sc) val sret = hc.sql(show tables) val ret = sret.filter(row = !barr1.equals(test)) ret.collect.foreach(println) } }
Re: Spark Installation
Hi Srikrishna the reason to this issue is you had uploaded assembly jar to HDFS twice. paste your command could be better diagnosis 田毅 === 橘云平台产品线 大数据产品部 亚信联创科技(中国)有限公司 手机:13910177261 电话:010-82166322 传真:010-82166617 Q Q:20057509 MSN:yi.t...@hotmail.com 地址:北京市海淀区东北旺西路10号院东区 亚信联创大厦 === 在 2014年7月9日,上午3:03,Srikrishna S srikrishna...@gmail.com 写道: Hi All, I tried the make distribution script and it worked well. I was able to compile the spark binary on our CDH5 cluster. Once I compiled Spark, I copied over the binaries in the dist folder to all the other machines in the cluster. However, I run into an issue while submit a job in yarn-client mode. I get an error message that says the following Resource file:/opt/spark/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0.jar changed on src filesystem (expected 1404845211000, was 1404845404000) My end goal is to submit a job (that uses MLLib) in our Yarn cluster. Any thoughts anyone? Regards, Krishna On Tue, Jul 8, 2014 at 9:49 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Srikrishna, The binaries are built with something like mvn package -Pyarn -Dhadoop.version=2.3.0-cdh5.0.1 -Dyarn.version=2.3.0-cdh5.0.1 -Sandy On Tue, Jul 8, 2014 at 3:14 AM, 田毅 tia...@asiainfo.com wrote: try this command: make-distribution.sh --hadoop 2.3.0-cdh5.0.0 --with-yarn --with-hive 田毅 === 橘云平台产品线 大数据产品部 亚信联创科技(中国)有限公司 手机:13910177261 电话:010-82166322 传真:010-82166617 Q Q:20057509 MSN:yi.t...@hotmail.com 地址:北京市海淀区东北旺西路10号院东区 亚信联创大厦 === 在 2014年7月8日,上午11:53,Krishna Sankar ksanka...@gmail.com 写道: Couldn't find any reference of CDH in pom.xml - profiles or the hadoop.version.Am also wondering how the cdh compatible artifact was compiled. Cheers k/ On Mon, Jul 7, 2014 at 8:07 PM, Srikrishna S srikrishna...@gmail.com wrote: Hi All, Does anyone know what the command line arguments to mvn are to generate the pre-built binary for spark on Hadoop 2-CHD5. I would like to pull in a recent bug fix in spark-master and rebuild the binaries in the exact same way that was used for that provided on the website. I have tried the following: mvn install -Pyarn -Dhadoop.version=2.3.0-cdh5.0.1 And it doesn't quite work. Any thoughts anyone?
Re: Shark Vs Spark SQL
add MASTER=yarn-client then the JDBC / Thrift server will run on yarn 2014-07-02 16:57 GMT-07:00 田毅 tia...@asiainfo.com: hi, Matei Do you know how to run the JDBC / Thrift server on Yarn? I did not find any suggestion in docs. 2014-07-02 16:06 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Spark SQL in Spark 1.1 will include all the functionality in Shark; take a look at http://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html. We decided to do this because at the end of the day, the only code left in Shark was the JDBC / Thrift server, which is a very small amount of code. There’s also a branch of Spark 1.0 that includes this server if you want to replace Shark on Spark 1.0: https://github.com/apache/spark/tree/branch-1.0-jdbc. The server runs in a very similar way to how Shark did. Matei On Jul 2, 2014, at 3:57 PM, Shrikar archak shrika...@gmail.com wrote: As of the spark summit 2014 they mentioned that there will be no active development on shark. Thanks, Shrikar On Wed, Jul 2, 2014 at 3:53 PM, Subacini B subac...@gmail.com wrote: Hi, http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3cb75376b8-7a57-4161-b604-f919886cf...@gmail.com%3E This talks about Shark backend will be replaced with Spark SQL engine in future. Does that mean Spark will continue to support Shark + Spark SQL for long term? OR After some period, Shark will be decommissioned ?? Thanks Subacini
Is it possible to run HiveThriftServer2 based on SparkSQL in YARN now?
Hi, everyone! Is it possible to run HiveThriftServer2 based on SparkSQL in YARN now? Spark version: branch 1.0-jdbc YARN version: 2.3.0-cdh5.0.0
Re: Shark Vs Spark SQL
hi, Matei Do you know how to run the JDBC / Thrift server on Yarn? I did not find any suggestion in docs. 2014-07-02 16:06 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Spark SQL in Spark 1.1 will include all the functionality in Shark; take a look at http://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html. We decided to do this because at the end of the day, the only code left in Shark was the JDBC / Thrift server, which is a very small amount of code. There’s also a branch of Spark 1.0 that includes this server if you want to replace Shark on Spark 1.0: https://github.com/apache/spark/tree/branch-1.0-jdbc. The server runs in a very similar way to how Shark did. Matei On Jul 2, 2014, at 3:57 PM, Shrikar archak shrika...@gmail.com wrote: As of the spark summit 2014 they mentioned that there will be no active development on shark. Thanks, Shrikar On Wed, Jul 2, 2014 at 3:53 PM, Subacini B subac...@gmail.com wrote: Hi, http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3cb75376b8-7a57-4161-b604-f919886cf...@gmail.com%3E This talks about Shark backend will be replaced with Spark SQL engine in future. Does that mean Spark will continue to support Shark + Spark SQL for long term? OR After some period, Shark will be decommissioned ?? Thanks Subacini